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Foreword 



id , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

x the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 

The 3GPP transparent end-to-end packet-switched streaming service (PSS) specification consists of three 3GPP TSs; 
3GPP TS 22.233 [1], 3GPP TS 26.233 [2] and the present document. The first TS contains the service requirements for 
the PSS, the second TS provides an overview of the 3GPP PSS and the present document the details of protocol and 
codecs used by the service. 



Introduction 

Streaming refers to the ability of an application to play synchronised media streams like audio and video streams in a 
continuous way while those streams are being transmitted to the client over a data network. 

Applications, which can be built on top of streaming services, can be classified into on-demand and live information 
delivery applications. Examples of the first category are music and news-on-demand applications. Live delivery of radio 
and television programs are examples of the second category. 

The 3GPP PSS provides a framework for Internet Protocol (IP) based streaming applications in 3G networks. 
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Scope 



The present document specifies the protocols and codecs for the PSS within the 3GPP system. Protocols for control 
signalling, capability exchange, scene description, media transport and media encapsulations are specified. Codecs for 
speech, natural and synthetic audio, video, still images, bitmap graphics, vector graphics, timed text and text are 
specified. 

The present document is applicable to IP based packet switched networks. 
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The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 
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3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

continuous media: media with an inherent notion of time. In the present document speech, audio, video and timed text 

discrete media: media that itself does not contain an element of time. In the present document all media not defined as 
continuous media 

device capability description: a description of device capabilities and/or user preferences. Contains a number of 
capability attributes 

device capability profile: same as device capability description 

presentation description: contains information about one or more media streams within a presentation, such as the set 
of encodings, network addresses and information about the content 

PSS client: client for the 3GPP packet switched streaming service based on the IETF RTSP/SDP and/or HTTP 
standards, with possible additional 3GPP requirements according to the present document 

PSS server: server for the 3GPP packet switched streaming service based on the IETF RTSP/SDP and/or HTTP 
standards, with possible additional 3GPP requirements according to the present document 

scene description: description of the spatial layout and temporal behaviour of a presentation. It can also contain 
hyperlinks 

3.2 Abbreviations 

For the purposes of the present document, the abbreviations given in 3GPP TR 21.905 [3] and the following apply. 

AAC Advanced Audio Coding 

BIFS Binary Format for Scenes 

CC/PP Composite Capability / Preference Profiles 

DCT Discrete Cosine Transform 
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GIF Graphics Interchange Format 

HTML Hyper Text Markup Language 

ITU-T International Telecommunications Union - Telecommunications 

JFIF JPEG File Interchange Format 

MIDI Musical Instrument Digital Interface 

MIME Multipurpose Internet Mail Extensions 

MMS Multimedia Messaging Service 

MP4 MPEG-4 file format 

PNG Portable Networks Graphics 

PSS Packet-switched Streaming Service 

QCIF Quarter Common Intermediate Format 

RDF Resource Description Framework 

RTCP RTP Control Protocol 

RTP Real-time Transport Protocol 

RTSP Real-Time Streaming Protocol 

SDP Session Description Protocol 

SMIL Synchronised Multimedia Integration Language 

SP-MIDI Scalable Polyphony MIDI 

SVG Scalable Vector Graphics 

UAProf User Agent Profile 

UCS-2 Universal Character Set (the two octet form) 

UTF-8 Unicode Transformation Format (the 8-bit form) 

UTF-16 Unicode Transformation Format (the 16-bit form) 

W3C WWW Consortium 

WML Wireless Markup Language 

XHTML extensible Hyper Text Markup Language 

XML extensible Markup Language 
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System description 
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Figure 1 : Functional components of a PSS client 
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Figure 1 shows the functional components of a PSS client. Figure 2 gives an overview of the protocol stack used in a 
PSS client and also shows a more detailed view of the packet based network interface. The functional components can 
be divided into control, scene description, media codecs and the transport of media and control data. 

The control related elements are session establishment, capability exchange and session control (see clause 5). 

Session establishment refers to methods to invoke a PSS session from a browser or directly by entering an URL 
in the terminal's user interface. 

Capability exchange enables choice or adaptation of media streams depending on different terminal capabilities. 

Session control deals with the set-up of the individual media streams between a PSS client and one or several 
PSS servers. It also enables control of the individual media streams by the user. It may involve VCR-like 
presentation control functions like start, pause, fast forward and stop of a media presentation. 

The scene description consists of spatial layout and a description of the temporal relation between different media that 
is included in the media presentation. The first gives the layout of different media components on the screen and the 
latter controls the synchronisation of the different media (see clause 8). 

The PSS includes media codecs for video, still images, vector graphics, bitmap graphics, text, timed text, natural and 
synthetic audio, and speech (see clause 7). 

Transport of media and control data consists of the encapsulation of the coded media and control data in a transport 
protocol (see clause 6). This is shown in figure 1 as the "packet based network interface" and displayed in more detail in 
the protocol stack of figure 2. 
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Figure 2: Overview of the protocol stack 



Protocols 



5.1 



Session establishment 



Session establishment refers to the method by which a PSS client obtains the initial session description. The initial 
session description can e.g. be a presentation description, a scene description or just an URL to the content. 

A PSS client shall support initial session descriptions specified in one of the following formats: SMIL, SDP, or plain 
RTSP URL. 
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In addition to rtsp:// the PSS client shall support URLs [4] to valid initial session descriptions starting with file:// (for 
locally stored files) and http:// (for presentation descriptions or scene descriptions delivered via HTTP). 

Examples for valid inputs to a PSS client are: file://temp/morning_news.smil, http://mediaportal/morning_news.sdp, 
and rtsp://mediaportal/morning_news. 

URLs can be made available to a PSS client in many different ways. It is out of the scope of this recommendation to 
mandate any specific mechanism. However, an application using the 3GPP PSS shall at least support URLs of the 
above type, specified or selected by the user. 

The preferred way would be to embed URLs to initial session descriptions within HTML or WML pages. Browser 
applications that support the HTTP protocol could then download the initial session description and pass the content to 
the PSS client for further processing. How exactly this is done is an implementation specific issue and out of the scope 
of this recommendation. 

5.2 Capability exchange 

5.2.1 General 

Capability exchange is an important functionality in the PSS. It enables PSS servers to provide a wide range of devices 
with content suitable for the particular device in question. Another very important task is to provide a smooth transition 
between different releases of PSS. Therefore, PSS clients and servers should support capability exchange. 

The specification of capability exchange for PSS is divided into two parts. The normative part contained in clause 5.2 
and an informative part in clause A.4 in Annex A of the present document. The normative part gives all the necessary 
requirements that a client or server shall conform to when implementing capability exchange in the PSS. The 
informative part provides additional important information for understanding the concept and usage of the functionality. 
It is recommended to read clause A.4 in Annex A before continuing with clauses 5.2.2-5.2.7. 

5.2.2 The device capability profile structure 

A device capability profile is a RDF [41] document that follows the structure of the CC/PP framework [39] and the 
CC/PP application UAProf [40]. Attributes are used to specify device capabilities and preferences. A set of attribute 
names, permissible values and semantics constitute a CC/PP vocabulary, which is defined by a RDF schema. For PSS 
the UAProf vocabulary is reused and an additional PSS specific vocabulary is defined. The details can be found in 
clause 5.2.3. The syntax of the attributes is defined in the vocabulary schema but also, to some extent, the semantics. A 
PSS device capability profile is an instance of the schema (UAProf and/or the PSS specific schema) and shall follow the 
rules governing the formation of a profile given in the CC/PP specification [39]. The profile schema shall also be 
governed by the rules defined in UAProf [40] chapter 7, 7.1, 7.3 and 7.4. 

5.2.3 Vocabularies for PSS 

5.2.3.1 General 

Clause 5.2.3 specifies the attribute vocabularies to be used by the PSS capability exchange. 

PSS servers should understand the attributes in both the streaming component of the PSS base vocabulary and the 
recommended attributes from the UAProf vocabulary [40]. A server may additionally support other UAProf attributes. 

5.2.3.2 PSS base vocabulary 

The PSS base vocabulary contains one component called "Streaming". A vocabulary extension to UAProf shall be 
defined as a RDF schema. This schema can be found in Annex F. The schema together with the description of the 
attributes in the present clause, defines the vocabulary. The vocabulary is associated with an XML namespace, which 
combines a base URI with a local XML element name to yield a URL Annex F provides the details. 

All PSS attributes are put in a PSS specific component called "Streaming". The list of PSS attributes is as follows: 

Attribute name: AudioChannels 
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Attribute definition: This attribute describes the stereophonic capability of the natural audio device. 

Component: Streaming 

Type: Literal 

Legal values: "Mono", "Stereo" 

Resolution rule: Locked 

EXAMPLE 1: <AudioChannels>Mono</AudioChannels> 



Attribute name: 



MaxPolyphony 



Attribute definition: The MaxPolyphony attribute refers to the maximal polyphony that the synthetic audio device 
supports as defined in [44] . 

NOTE: MaxPolyphony attribute can be used to signal the maximum polyphony capabilities supported by the PSS 
client. This is a complementary mechanism for the delivery of compatible SP-MIDI content and thus the 
PSS client is required to support Scalable Polyphony MIDI i.e. Channel Masking defined in [44]. 

Component: Streaming 

Type: Number 

Legal values: Integer between 5 and 24 

Resolution rule: Locked 

EXAMPLE 2: <MaxPolyphony>8</MaxPolyphony> 



Attribute name: Pss Accept 

Attribute definition: List of content types (MIME types) the PSS application supports. Both CcppAccept 

(SoftwarePlatform, UAProf) and Pss Accept can be used but if Pss Accept is defined it has 
precedence over CcppAccept. 



Component: Streaming 

Type: Literal (Bag) 

Legal values: List of MIME types with related parameters. 

Resolution rule: Append 

EXAMPLE 3: <PssAccept> 

<rdf :Bag> 

<rdf : li>audio/AMR-WB; octet-alignment</rdf : li> 
<rdf : li>application/smil</rdf : li> 

</rdf :Bag> 
</PssAccept> 



Attribute name: PssAccept-Subset 

Attribute definition: List of content types for which the PSS application supports a subset. MIME-types can in most 
cases effectively be used to express variations in support for different media types. Many 
MIME-types, e.g. AMR-NB has several parameters that can be used for this purpose. There 
may exist content types for which the PSS application only supports a subset and this subset 
can not be expressed with MIME-type parameters. In these cases the attribute PssAccept- 
Subset is used to describe support for a subset of a specific content type. If a subset of a 
specific content type is declared in PssAccept-Subset, this means that PssAccept-Subset has 
precedence over both PssAccept and CcppAccept. PssAccept and/or CcppAccept shall always 
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Component: 
Type: 

Legal values: 
Resolution rule: 
EXAMPLE 4: 



include the corresponding content types for which PSSAccept-Subset specifies subsets of. 
This is to ensure compatibility with those content servers that do not understand the PssAccept- 
Subset attribute but do understand e.g. CcppAccept. 

This is illustrated with an example. If PssAccept="audio/AMR", "image/jpeg" and PssAccept- 
Subset="JPEG-PSS" then "audio/AMR" and JPEG Base line are supported, "image/jpeg" in 
PssAccept is of no importance since it is related to "JPEG-PSS" in PssAccept-Subset. Subset 
identifiers and corresponding semantics shall only be defined by the TSG responsible for the 
present document. The following values are defined: 

"JPEG-PSS": Only the two JPEG modes described in clause 7.5 of the present document 
are supported. 

- "SVG-Tiny" 

- "SVG-Basic" 
Streaming 
Literal (Bag) 

"JPEG-PSS", "SVG-Tiny", "SVG-Basic" 
Append 

<PssAccept-Subset> 

<rdf :Bag> 

<rdf :li>JPEG-PSS</rdf :li> 

</rdf :Bag> 
< /PssAccept -Subset > 



Attribute name: Pss Version 

Attribute definition: PSS version supported by the client. 

Component: Streaming 

Type: Literal 

Legal values: "3GPP-R4", "3GPP-R5" and so forth. 

Resolution rule: Locked 

EXAMPLE 5: <PssVersion>3GPP-R4</PssVersion> 

Attribute name: RenderingScreenSize 

Attribute definition: The rendering size of the device's screen in unit of pixels. The horizontal size is given 
followed by the vertical size. 



Component: 
Type: 



Streaming 
Dimension 



Legal values: Two integer values equal or greater than zero. A value equal "0x0"means that there exists no 

possibility to render visual PSS presentations. 

Resolution rule: Locked 

EXAMPLE 6: <RenderingScreenSize>70xl5</RenderingScreenSize> 

Attribute name: SmilBaseSet 
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Attribute definition: Indicates a base set of SMIL 2.0 modules that the client supports. 

Component: Streaming 

Type: Literal 

Legal values: Pre-defined identifiers. "SMIL-3GPP-R4" indicates all SMIL 2.0 modules required for scene 

description support according to clause 8 of Release 4 of TS 26.234. "SMIL-3GPP-R5" 
indicates all SMIL 2.0 modules required for scene description support according to clause 8 of 
the present document (Release 5 of TS 26.234). 

Resolution rule: Locked 

EXAMPLE 7: <SmilBaseSet>SMIL-3GPP-R4</SmilBaseSet> 



Attribute name: 



SmilModules 



Attribute definition: This attribute defines a list of SMIL 2.0 modules supported by the client. If the SmilBaseSet is 
used those modules do not need to be explicitly listed here. In that case only additional module 
support needs to be listed. 

Component: Streaming 

Type: Literal (Bag) 

Legal values: SMIL 2.0 module names defined in the SMIL 2.0 recommendation [31], section 2.3.3, table 2. 

Resolution rule: Append 

EXAMPLE 8: <SmilModules> 

<rdf :Bag> 

<rdf : li>BasicTransitions</rdf : li> 

<rdf : li>MulitArcTiming</rdf : li> 
</rdf :Bag> 

</SmilModules> 



Attribute name: VideoDecodingByteRate 

Attribute definition: If Annex G is not supported, the attribute has no meaning. If Annex G is supported, this 

attribute defines the peak decoding byte rate the PSS client is able to support. In other words, 
the PSS client fulfils the requirements given in Annex G with the signalled peak decoding byte 
rate. The values are given in bytes per second and shall be greater than or equal to 8000. 
According to Annex G, 8000 is the default peak decoding byte rate for the mandatory video 
codec profile and level (H.263 Profile Level 10). 

Component: Streaming 

Type: Number 

Legal values: Integer value greater than or equal to 8000. 

Resolution rule: Locked 

EXAMPLE 9: <VideoDecodingByteRate>16000</VideoDecodingByteRate> 



Attribute name: VideoInitialPostDecoderBufferingPeriod 

Attribute definition: If Annex G is not supported, the attribute has no meaning. If Annex G is supported, this 
attribute defines the maximum initial post-decoder buffering period of video. Values are 
interpreted as clock ticks of a 90-kHz clock. In other words, the value is incremented by one 
for each 1/90 000 seconds. For example, the value 9000 corresponds to 1/10 of a second initial 
post-decoder buffering. 
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Component: Streaming 

Type: Number 

Legal values: Integer value equal to or greater than zero. 

Resolution rule: Locked 

EXAMPLE 10: <VideoInitialPostDecoderBufferingPeriod>9000 
</ Video I nitialPost Decoder Buf f eringPeriod> 

Attribute name: VideoPreDecoderBufferSize 

Attribute definition: This attribute signals if the optional video buffering requirements defined in Annex G are 

supported. It also defines the size of the hypothetical pre-decoder buffer defined in Annex G. A 
value equal to zero means that Annex G is not supported. A value equal to one means that 
Annex G is supported. In this case the size of the buffer is the default size defined in Annex G. 
A value equal to or greater than the default buffer size defined in Annex G means that Annex 
G is supported and sets the buffer size to the given number of octets. 

Component: Streaming 

Type: Number 

Legal values: Integer value equal to or greater than zero. Values greater than one but less than the default 

buffer size defined in Annex G are not allowed. 

Resolution rule: Locked 

EXAMPLE 1 1 : <VideoPreDecoderBufferSize>30720</VideoPreDecoderBufferSize> 

5.2.3.3 Attributes from UAProf 

In the UAProf vocabulary [40] there are several attributes that are of interest for the PSS. The formal definition of these 
attributes is given in [40]. The following list of attributes is recommended for PSS applications: 

Attribute name: BitsPerPixel 

Component: HardwarePlatform 

Attribute description: The number of bits of colour or greyscale information per pixel 

EXAMPLE 1: <BitsPerPixel>8</BitsPerPixel> 



Attribute name: ColorCapable 

Component: HardwarePlatform 

Attribute description: Whether the device display supports colour or not. 

EXAMPLE 2: <ColorCapable>Yes</ColorCapable> 

Attribute name: PixelAspectRatio 

Component: HardwarePlatform 

Attribute description: Ratio of pixel width to pixel height 

EXAMPLE 3: <PixelAspectRatio>lx2</PixelAspectRatio> 
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Attribute name: PointingResolution 

Component: HardwarePlatform 

Attribute description: Type of resolution of the pointing accessory supported by the device. 

EXAMPLE 4: <PointingResolution>Pixel</PointingResolution> 

Attribute name: Model 

Component: HardwarePlatform 

Attribute description: Model number assigned to the terminal device by the vendor or manufacturer 

EXAMPLE 5: <Model>Lexus</Model> 

Attribute name: Vendor 

Component: HardwarePlatform 

Attribute description: Name of the vendor manufacturing the terminal device 

EXAMPLE 6: <Vendor>Toyota</Vendor> 

Attribute name: CcppAccept-Charset 

Component: SoftwarePlatform 

Attribute description: List of character sets the device supports 

EXAMPLE 7: <CcppAccept-Charset> 
<rdf :Bag> 

<rdf : li>UTF-8</rdf : li> 
</rdf :Bag> 
</CcppAccept -Char set > 

Attribute name: CcppAccept-Encoding 

Component: SoftwarePlatform 

Attribute description: List of transfer encodings the device supports 

EXAMPLE 8: <CcppAccept-Encoding> 
<rdf :Bag> 

<rdf : li>base64</rdf : li> 
</rdf :Bag> 
</CcppAccept-Encoding> 

Attribute name: CcppAccept-Language 

Component: SoftwarePlatform 

Attribute description: List of preferred document languages 
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EXAMPLE 9: <CcppAccept-Language> 

<rdf : Seq> 

<rdf : li>en</rdf : li> 
<rdf : li>se</rdf : li> 
</rdf : Seq> 
</CcppAccept-Language> 

5.2.4 Extensions to the PSS schema/vocabulary 

The use of RDF enables an extensibility mechanism for CC/PP-based schemas that addresses the evolution of new types 
of devices and applications. The PSS profile schema specification is going to provide a base vocabulary but in the 
future new usage scenarios might have need for expressing new attributes. If the base vocabulary is updated a new 
unique namespace will be assigned to the updated schema. The base vocabulary shall only be changed by the TSG 
responsible for the present document. All extensions to the profile schema shall be governed by the rules defined in [40] 
clause 7.7. 

5.2.5 Signalling of profile information between client and server 

When a PSS client or server support capability exchange it shall support the profile information transport over both 
HTTP and RTSP between client and server as defined in clause 9.1 (including its subsections) of the WAP 2.0 UAProf 
specification [40] with the following additions: 

The "x-wap-profile" and "x-wap-profile-diff ' headers may not be present in all HTTP or RTSP request. That is, 
the requirement to send this header in all requests has been relaxed. 

The defined headers may be applied to both RTSP and HTTP. 

The "x-wap-profile-diff" header is only valid for the current request. The reason is that PSS does not have the 
WSP session concept of WAP. 

Push is not relevant for the PSS. 

The following recommendations are made to how and when profile information should be sent between client and 
server: 

PSS content servers supporting capability exchange shall be able to receive profile information in all HTTP and 
RTSP requests. 

The terminal should not send the "x-wap-profile-diff header over the air-interface since there is no compression 
scheme defined. 

RTSP: the client should send profile information in the DESCRIBE message. It may send it in any other request. 

If the terminal has some prior knowledge about the file type it is about to retrieve, e.g. file extensions, the following 
apply: 

HTTP and SDP: when retrieving an SDP with HTTP the client should include profile information in the GET 
request. This way the HTTP server can deliver an optimised SDP to the client. 

HTTP and SMIL: When retrieving a SMIL file with HTTP the client should include profile information in the 
GET request. This way the HTTP server can deliver an optimised SMIL presentation to the client. A SMIL 
presentation can include links to static media. The server should optimise the SMIL file so that links to the 
referenced static media are adapted to the requesting client. When the "x-wap -pro file- warning" indicates that 
content selection has been applied (201-203) the PSS client should assume that no more capability exchange has 
to be performed for the static media components. In this case it should not send any profile information when 
retrieving static media to be included in the SMIL presentation. This will minimise the HTTP header overhead. 

5.2.6 Merging device capability profiles 

Profiles need to be merged whenever the PSS server receives multiple device capability profiles. Multiple occurrences 
of attributes and default values make it necessary to resolve the profiles according to a resolution process. 

The resolution process shall be the same as defined in UAProf [40] clause 6.4.1. 
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Resolve all indirect references by retrieving URI references contained within the profile. 

Resolve each profile and profile-diff document by first applying attribute values contained in the default URI 
references and by second applying overriding attribute values contained within the category blocks of that profile 
or profile-diff. 

Determine the final value of the attributes by applying the resolved attribute values from each profile and profile- 
diff in order, with the attribute values determined by the resolution rules provided in the schema. Where no 
resolution rules are provided for a particular attribute in the schema, values provided in profiles or profile-diffs 
are assumed to override values provided in previous profiles or profile-diffs. 

When several URLs are defined in the "x-wap-profile" header and there exists any attribute that occurs more than once 
in these profiles the rule is that the attribute value in the second URL overrides, or is overridden by, or is appended to 
the attribute value from the first URL (according to the resolution rule) and so forth. This is what is meant with 
"Determine the final value of the attributes by applying the resolved attribute values from each profile and profile-diff 
in order, with. . ." in the third bullet above. If the profile is completely or partly inaccessible or otherwise corrupted the 
server should still provide content to the client. The server is responsible for delivering content optimised for the client 
based on the received profile in a best effort manner. 

NOTE: For the reasons explained in Annex A clause A.4.3 the usage of indirect references in profiles (using the 
CC/PP defaults element) is not recommended. 

5.2.7 Profile transfer between the PSS server and the device profile 
server 

The device capability profiles are stored on a device profile server and referenced with URLs. According to the profile 
resolution process in clause 5.2.6 of the present document, the PSS server ends up with a number of URLs referring to 
profiles and these shall be retrieved. 

The device profile server shall support HTTP 1.1 for the transfer of device capability profiles to the PSS server. 

If the PSS server supports capability exchange it shall support HTTP 1.1 for transfer of device capability profiles 
from the device profile server. A URL shall be used to identify a device capability profile. 

Normal content caching provisions as defined by HTTP apply. 

5.3 Session set-up and control 

5.3.1 General 

Continuous media is media that has an intrinsic time line. Discrete media on the other hand does not itself contain an 
element of time. In this specification speech, audio and video belongs to first category and still images and text to the 
latter one. 

Streaming of continuous media using RTP/UDP/IP (see clause 6.2) requires a session control protocol to set-up and 
control of the individual media streams. For the transport of discrete media (images and text), vector graphics, timed 
text and synthetic audio this specification adopts the use of HTTP/TCP/IP (see clause 6.3). In this case there is no need 
for a separate session set-up and control protocol since this is built into HTTP. This clause describes session set-up and 
control of the continuous media speech, audio and video. 

5.3.2 RTSP 

RTSP [5] shall be used for session set-up and session control. PSS clients and servers shall follow the rules for minimal 
on-demand playback RTSP implementations in appendix D of [5]. In addition to this: 

PSS servers and clients shall implement the DESCRIBE method (see clause 10.2 in [5]); 

PSS servers and clients shall implement the Range header field (see clause 12.29 in [5]); 

PSS servers shall include the Range header field in all PLAY responses. 
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5.3.3 SDP 

5.3.3.1 General 

RTSP requires a presentation description. SDP shall be used as the format of the presentation description for both PSS 
clients and servers. PSS servers shall provide and clients interpret the SDP syntax according to the SDP specification 
[6] and appendix C of [5], The SDP delivered to the PSS client shall declare the media types to be used in the session 
using a codec specific MIME media type for each media. MIME media types to be used in the SDP file are described in 
clause 5.4 of the present document. 

The SDP [6] specification requires certain fields to always be included in an SDP file. Apart from this a PSS server 
shall always include the following fields in the SDP: 

"a=control:" according to clauses C.l.l, C.2 and C.3 in [5]; 

"a=range:" according to clause C.1.5 in [5]; 

"a=rtpmap:" according to clause 6 in [6]; 

"a=fmtp:" according to clause 6 in [6]. 

The bandwidth field in SDP is needed by the client in order to properly set up QoS parameters. Therefore, a PSS server 
shall include the "b=AS:" field at the media level for each media stream in SDP, and a PSS client shall interpret this 
field. When a client receives SDP, it should ignore the session level "b=AS:" parameter (if present), and instead 
calculate session bandwidth from the media level bandwidth values of the relevant streams. Note that for RTP based 
applications , 'b=AS:' gives the RTP "session bandwidth" (including UDP/IP overhead) as defined in section 6.2 of [9]. 

NOTE: The SDP parsers and/or interpreters shall be able to accept NULL values in the 'c=' field (e.g. 0.0.0.0 in IPv4 
case). This may happen when the media content does not have a fixed destination address. For more 
details, see Section C.1.7 of [5] and Section 6 of [6]. 

5.3.3.2 Additional SDP fields 

The following Annex G-related media level SDP fields are defined for PSS: 

"a=X-predecbufsize:<size of the hypothetical pre-decoder buffer>" 

This gives the suggested size of the Annex G hypothetical pre-decoder buffer in bytes. 

"a=X-initpredecbufperiod:<initial pre-decoder buffering period>" 

This gives the required initial pre-decoder buffering period specified according to Annex G. Values are 
interpreted as clock ticks of a 90-kHz clock. That is, the value is incremented by one for each 1/90 000 seconds. 
For example, value 180 000 corresponds to a two second initial pre-decoder buffering. 

"a=X-initpostdecbufperiod:<initial post-decoder buffering period>" 

This gives the required initial post-decoder buffering period specified according to Annex G Values are 

interpreted as clock ticks of a 90-kHz clock. 

"a=X-decbyterate:<peak decoding byte rate>" 

This gives the peak decoding byte rate that was used to verify the compatibility of the stream with Annex G 

Values are given in bytes per second. 

If none of the attributes "a=X-predecbufsize:", "a=X-initpredecbufperiod:", "a=X-initpostdecbufperiod:", and "a=x- 
decbyterate:" is present, clients should not expect a packet stream according to Annex G If at least one of the listed 
attributes is present, the transmitted video packet stream shall conform to Annex G If at least one of the listed attributes 
is present, but some of the listed attributes are missing in an SDP description, clients should expect a default value for 
the missing attributes according to Annex G. 

5.4 MIME media types 

For continuous media (speech, audio and video) the following MIME media types shall be used: 
AMR narrow-band speech codec (see clause 7.2) MIME media type as defined in [1 1]; 
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AMR wideband speech codec (see clause 7.2) MIME media type as defined in [11]; 

- MPEG-4 AAC audio codec (see clause 7.3) MIME media type as defined in RFC 3016 [13]. When used in SDP 
the attribute "cpresent" SHALL be set to "0" indicating that the configuration information is only carried out of 
band in the SDP "config" parameter; 

- MPEG-4 video codec (see clause 7.4) MIME media type as defined in RFC 3016 [13]. When used in SDP the 
configuration information shall be carried outband in the "config" SDP parameter and inband (as stated in RFC 
3016). As described in RFC 3016, the configuration information sent inband and the config information in the 
SDP shall be the same except that first_half_vbv_occupancy and latter_half_vbv_occupancy which, if exist, may 
vary in the configuration information sent inband; 

H.263 [22] video codec (see clause 7.4) MIME media type as defined in annex C, clause C.l of the present 
document. 

MIME media types for JPEG, GIF, PNG, SP-MIDI, SVG, timed text and XHTML can be used both in the "Content- 
type" field in HTTP and in the "type" attribute in SMIL 2.0. The following MIME media types shall be used for these 
media: 

JPEG (see clause 7.5) MIME media type as defined in [15]; 

GIF (see clause 7.6) MIME media type as defined in [15]; 

PNG (see sub clause 7.6) MIME media type as defined in [38]; 

SP-MIDI (see sub clause 7.3A) MIME media type as defined in clause C.2 in Annex C of the present document; 

SVG (see sub clause 7.7) MIME media type as defined in [42]; 

- XHTML (see clause 7.8) MIME media type as defined in [16]; 

Timed text (see subclause 7.9) MIME media type as defined in clause D.9 in Annex D of the present document. 
MIME media type used for SMIL files shall be according to [31] and for SDP files according to [6]. 



6 Data transport 

6.1 Packet based network interface 

PSS clients and servers shall support an IP-based network interface for the transport of session control and media data. 
Control and media data are sent using TCP/IP [8] and UDP/IP [7]. An overview of the protocol stack can be found in 
figure 2 of the present document. 

6.2 RTP over UDP/IP 

The IETF RTP [9] and [10] provides means for sending real-time or streaming data over UDP (see [7]). The encoded 
media is encapsulated in the RTP packets with media specific RTP payload formats. RTP payload formats are defined 
by IETF. RTP also provides a protocol called RTCP (see clause 6 in [9]) for feedback about the transmission quality. 
For the calculation of the RTCP transmission interval Annex A.7 in [9] shall be used. Clause A. 3. 2. 3 in Annex A of the 
present document provides more information about the minimum RTCP transmission interval. 

RTP/UDP/IP transport of continuous media (speech , audio and video) shall be supported. 

For RTP/UDP/IP transport of continuous media the following RTP payload formats shall be used: 

AMR narrow-band speech codec (see clause 7.2) RTP payload format according to [1 1]. A PSS client is not 
required to support multi-channel sessions; 

AMR wideband speech codec (see clause 7.2) RTP payload format according to [11]. A PSS client is not 
required to support multi-channel sessions; 
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- MPEG-4 AAC audio codec (see clause 7.3) RTP payload format according to RFC 3016 [13]; 

MPEG-4 video codec (see clause 7.4) RTP payload format according to RFC 3016 [13]; 

H.263 video codec (see clause 7.4) RTP payload format according to RFC 2429 [14]. 

NOTE: The payload format RFC 3016 for MPEG-4 AAC specify that the audio streams shall be formatted by the 
LATM (Low-overhead MPEG-4 Audio Transport Multiplex) tool [21]. It should be noted that the 
references for the LATM format in the RFC 3016 [13] point to an older version of the LATM format than 
included in [21]. In [21] a corrigendum to the LATM tool is included. This corrigendum includes changes 
to the LATM format making implementations using the corrigendum incompatible with implementations 
not using it. To avoid future interoperability problems, implementations of PSS client and servers 
supporting AAC shall follow the changes to the LATM format included in [21]. 

6.3 HTTP over TCP/IP 

The IETF TCP provides reliable transport of data over IP networks, but with no delay guarantees. It is the preferred way 
for sending the scene description, text, bitmap graphics and still images. There is also need for an application protocol 
to control the transfer. The IETF HTTP [17] provides this functionality. 

HTTP/TCP/IP transport shall be supported for: 

still images (see clause 7.5); 

bitmap graphics (see clause 7.6); 

synthetic audio (see clause 7.3A); 

vector graphics (see clause 7.7); 

text (see clause 7.8); 

timed text (see clause 7.9); 

scene description (see clause 8); 

presentation description (see clause 5.3.3). 

6.4 Transport of RTSP 

Transport of RTSP shall be supported according to RFC 2326 [5]. 

7 Codecs 

7.1 General 

For PSS offering a particular media type, media decoders are specified in the following clauses. 



7.2 Speech 



The AMR decoder shall be supported for narrow-band speech [18]. The AMR wideband speech decoder [20] shall be 
supported when wideband speech working at 16 kHz sampling frequency is supported. 
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7.3 Audio 

MPEG-4 AAC Low Complexity (AAC-LC) object type decoder [21] should be supported. The maximum sampling rate 
to be supported by the decoder is 48 kHz. The channel configurations to be supported are mono (1/0) and stereo (2/0). 
In addition, the MPEG-4 AAC Long Term Prediction (AAC-LTP) object type decoder may be supported. 

When a server offers an AAC-LC or AAC-LTP stream with the specified restrictions, it shall include the "profile-level- 
id" and "object" MIME parameters in the SDP "a=fmtp" line. The following values shall be used: 



Object Type 


profile-level-id 


object 


AAC-LC 


15 


2 


AAC-LTP 


15 


4 



7.3a Synthetic audio 



The Scalable Polyphony MIDI (SP-MIDI) content format defined in Scalable Polyphony MIDI Specification [44] and 
the device requirements defined in Scalable Polyphony MIDI Device 5-to-24 Note Profile for 3GPP [45] should be 
supported. 

SP-MIDI content is delivered in the structure specified in Standard MIDI Files 1.0 [46], either in format or format 1. 

7.4 Video 

ITU-T Recommendation H.263 [22] profile level 10 shall be supported. This is the mandatory video decoder for the 
PSS. In addition, PSS should support: 

- H.263 [23] Profile 3 Level 10 decoder; 

- MPEG-4 Visual Simple Profile Level decoder, [24] and [25] . 
These two video decoders are optional to implement. 

An optional video buffer model is given in Annex G of the present document. 

NOTE: ITU-T Recommendation H.263 [22] baseline has been mandated to ensure that video-enabled PSS 
support a minimum baseline video capability and interoperability can be guaranteed (an H.263 [22] 
baseline bitstream can be decoded by both H.263 [22] and MPEG-4 decoders). It also provides a simple 
upgrade path for mandating more advanced decoders in the future (from both the ITU-T and ISO MPEG). 



7.5 Still images 



ISO/IEC JPEG [26] together with JFIF [27] decoders shall be supported. The support for ISO/IEC JPEG only apply to 
the following two modes: 

baseline DCT, non-differential, Huffman coding, as defined in table B.l, symbol 'SOF0' in [26]; 

progressive DCT, non-differential, Huffman coding, as defined in table B.l, symbol 'SOF2' [26]. 

7.6 Bitmap graphics 

The following bitmap graphics decoders should be supported: 

- GIF87a, [32]; 

- GIF89a, [33]; 

- PNG, [38]. 
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7.7 Vector graphics 

The SVG Tiny profile [42] [43] shall be supported. In addition SVG Basic profile [42] [43] may be supported. 

7.8 Text 

The text decoder is intended to enable formatted text in a SMIL presentation. A PSS client shall support 

- text formatted according to XHTML Mobile Profile [47]; 

- rendering a SMIL presentation where text is referenced with the SMIL 2.0 "text" element together with the SMIL 
2.0 "src" attribute. 

The following character coding formats shall be supported: 

- UTF-8, [30]; 

- UCS-2, [29]. 

NOTE: Since both SMIL and XHTML are XML based languages it would be possible to define a SMIL plus 

XHTML profile. In contrast to the present defined PSS4 SMIL Language Profile that only contain SMIL 
modules, such a profile would also contain XHTML modules. No combined SMIL and XHTML profile is 
specified for PSS. Rendering of such documents is out of the scope of the present document. 

7.9 Timed text 

If timed text is supported, PSS clients shall support Annex D, clause D.8a, of this specification. There is no support for 
RTP transport of timed text in this release; 3GPP (MP4) files containing timed text may only be downloaded. 

NOTE: When a PSS client supports timed text it needs to be able to receive and parse 3GPP (MP4) files 

containing the text streams. This does not imply a requirement on PSS clients to be able to render other 
continuous media types contained in 3GPP (MP4) files, e.g. AMR and H. 263, if such media types are 
included in a presentation together with timed text. Audio and video are instead streamed to the client 
using RTSP/RTP (see clause 6.2). 



8 Scene description 

8.1 General 

The 3GPP PSS uses a subset of SMIL 2.0 [31] as format of the scene description. PSS clients and servers with support 
for scene descriptions shall support the 3GPP PSS SMIL Language Profile defined in clause 8.2 (abbreviated 3GPP PSS 
SMIL). This profile is a subset of the SMIL 2.0 Language Profile, but a superset of the SMIL 2.0 Basic Language 
Profile. The present document also includes an informative Annex B that provides guidelines for SMIL content authors. 

NOTE: The interpretation of this is not that all streaming sessions are required to use SMIL. For some types of 
sessions, e.g. consisting of one single continuous media or two media synchronised by using RTP 
timestamps, SMIL may not be needed. 

8.2 3GPP PSS SMIL Language Profile 
8.2.1 Introduction 

3GPP PSS SMIL is a markup language based on SMIL Basic [31] and SMIL Scalability Framework. 

3GPP PSS SMIL consists of the modules required by SMIL Basic Profile (and SMIL 2.0 Host Language Conformance) 
and additional MediaAccessibility, MediaDescription, MediaClipping, Metalnformation, PrefetchControl, EventTiming 
and BasicTransitions modules. All of the following modules are included: 
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SMIL 2.0 Content Control Modules — BasicContentControl, SkipContentControl and PrefetchControl 

- SMIL 2.0 Layout Module - BasicLayout 

- SMIL 2.0 Linking Module - BasicLinking 

SMIL 2.0 Media Object Modules - BasicMedia, MediaClipping, MediaAccessibility and MediaDescription 

- SMIL 2.0 Metainformation Module — Metainformation 

- SMIL 2.0 Structure Module -- Structure 

SMIL 2.0 Timing and Synchronization Modules — BasicInlineTiming, MinMaxTiming, BasicTimeContainers, 
RepeatTiming and EventTiming 

SMIL 2.0 Transition Effects Module — BasicTransitions 

8.2.2 Document Conformance 

A conforming 3GPP PSS SMIL document shall be a conforming SMIL 2.0 document. 
All 3GPP PSS SMIL documents use SMIL 2.0 namespace. 

<smil xmlns="http: //www. w3 . org/2 00 1/SMIL2 0/Language"> 

3GPP PSS SMIL documents may declare requirements using systemRequired attribute: 

EXAMPLE 1: <smil xmlns="http : //www. w3 . org/2001/SMIL20/Language" 

xmlns:EventTiming="http://www.w3.org/2000/SMIL20/CR/EventTiming" 
systemRequired="EventTiming"> 

Namespace URI http://www.3gpp.org/SMIL20/PSS5/ identifies the version of the 3GPP PSS SMIL profile described in 
the present document. Authors may use this URI to indicate requirement for exact 3GPP PSS SMIL semantics for a 
document or a subpart of a document: 

EXAMPLE 2: <smil xmlns="http : //www. w3 . org/2001/SMIL20/Language" 
xmlns:pss5="http://www.3gpp.org/SMIL20/PSS5/" 
systemRequired="pss5"> 

The content authors should generally not include the PSS requirement in the document unless the SMIL document relies 
on PSS specific semantics that are not part of the W3C SMIL. The reason for this is that SMIL players that are not 
conforming 3GPP PSS user agents may not recognize the PSS URI and thus refuse to play the document. 

8.2.3 User Agent Conformance 

A conforming 3GPP PSS SMIL user agent shall be a conforming SMIL Basic User Agent. 

A conforming user agent shall implement the semantics 3GPP PSS SMIL as described in clauses 8.2.4 and 8.2.5 
(including subclauses). 

A conforming user agent shall recognise 

- the URIs of all included SMIL 2.0 modules; 

the URI http://www.3gpp.org/SMIL20/PSS5/ as referring to all modules and semantics of the version of the 
3GPP PSS SMIL profile described in the present document; 

- the URI http://www.3gpp.org/SMIL20/PSS4/ as referring to all modules and semantics of the 3GPP PSS SMIL 
profile defined in Release 4 of the present document. 

NOTE: The difference between PSS4 and PSS5 is that the BasicTransitions module has been added in PSS5. 
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8.2.4 3GPP PSS SMIL Language Profile definition 

3GPP PSS SMIL is based on SMIL 2.0 Basic language profile [31]. This chapter defines the content model and 
integration semantics of the included modules where they differ from those defined by SMIL Basic. 

8.2.4.1 Content Control Modules 

3GPP PSS SMIL includes the content control functionality of the BasicContentControl, SkipContentControl and 
PrefetchControl modules of SMIL 2.0. PrefetchControl is not part of SMIL Basic and is an additional module in this 
profile. 

All BasicContentControl attributes listed in the module specification shall be supported. 

NOTE: The SMIL specification [31] defines that all functionality of PrefetchControl module is optional. This 
mean that even although PrefetchControl is mandatory user agents may implement semantics of 
PrefetchControl module only partially or not to implement them at all. 

PrefetchControl module adds the prefetch element to the content model of SMIL Basic body, switch, par and seq 
elements. The prefetch element has the attributes defined by the PrefetchControl module (mediaSize, mediaTime and 
bandwidth), the src attribute, the BasicContentControl attributes and the skip-content attribute. 

8.2.4.2 Layout Module 

3GPP PSS SMIL includes the BasicLayout module of SMIL 2.0 for spatial layout. The module is part of SMIL Basic. 
Default values of the width and height attributes for root-layout shall be the dimensions of the device display area. 

8.2.4.3 Linking Module 

3GPP PSS SMIL includes the SMIL 2.0 BasicLinking module for providing hyperlinks between documents and 
document fragments. This module is from SMIL Basic. 

When linking to destinations outside the current document, implementations may ignore values "play" and "pause" of 
the 'sourcePlaystate' attribute and values "new" and "pause" of the 'show' attribute, instead using the semantics of values 
"stop" and "replace" respectively. When the values of 'sourcePlaystate' and 'show' are ignored the player may also 
ignore the 'sourceLevel' attribute since it is of no use then 

8.2.4.4 Media Object Modules 

3GPP PSS SMIL includes the media elements from the SMIL 2.0 BasicMedia module and attributes from the 
Media Accessibility, MediaDescription and MediaClipping modules. MediaAccessibility, MediaDescription and 
MediaClipping modules are additions in this profile to the SMIL Basic. 

See clause 5.4 for what are the mandatory and optional MIME types a 3GPP PSS SMIL player needs to support. 

MediaClipping module adds to the profile the ability to address sub-clips of continuous media. MediaClipping module 
adds 'clipBegin' and 'clipEnd'(and for compatibility 'clip-begin' and 'clip-end') attributes to all media elements. 

MediaAccessibility module provides basic accessibility support for media elements. New attributes 'alt', 'longdesc' and 
'readlndex' are added to all media elements by this module. MediaDescription module is included by the 
MediaAccessibility module and adds 'abstract', 'author' and 'copyright' attributes to media elements. 

8.2.4.5 Metainformation Module 

The Metainformation module of SMIL 2.0 is included to the profile. This module is addition in this profile to the SMIL 
Basic and provides a way to include descriptive information about the document content into the document. 

This module adds meta and metadata elements to the content model of SMIL Basic head element. 

8.2.4.6 Structure Module 

The Structure module defines the top-level structure of the document. It is included by SMIL Basic. 
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8.2.4.7 Timing and Synchronization modules 

The timing modules included in the 3GPP SMIL are BasicInlineTiming, MinMaxTiming, BasicTimeContainers, 
RepeatTiming and EventTiming. The EventTiming module is an addition in this profile to the SMIL Basic. 

For 'begin' and 'end' attributes either single offset-value or single event-value shall be allowed. Offsets shall not be 
supported with event-values. 

Event timing attributes that reference invalid IDs (for example elements that have been removed by the content control) 
shall be treated as being indefinite. 

Supported event names and semantics shall be as defined by the SMIL 2.0 Language Profile. All user agents shall be 
able to raise the following event types: 

activateEvent; 

beginEvent; 

endEvent. 

The following SMIL 2.0 Language event types should be supported: 

focusInEvent; 

- focusOutEvent; 
inBoundsEvent; 
outBoundsEvent; 
repeatEvent. 

User agents shall ignore unknown event types and not treat them as errors. 

Events do not bubble and shall be delivered to the associated media or timed elements only. 

8.2.4.8 Transition Effects Module 

3GPP PSS SMIL profile includes the SMIL 2.0 BasicTransitions module to provide a framework for describing 
transitions between media elements. 

NOTE: The SMIL specification [31] defines that all functionality of BasicTransitions module is optional: 

"Transitions are hints to the presentation. Implementations must be able to ignore transitions if they so 
desire and still play the media of the presentation". This mean that even although the BasicTransitions 
module is mandatory user agents may implement semantics of the BasicTransitions module only partially 
or not to implement them at all. Content authors should use transitions in their SMIL presentation where 
this appears useful. User agents that fully support the semantics of the Basic Transitions module will 
render the presentation with the specified transitions. All other user agents will leave out the transitions 
but present the media content correctly. 

User agents that implement the semantics of this module should implement at least the following transition effects 
described in SMIL 2.0 specification [31]: 

barWipe; 

irisWipe; 

clockWipe; 

snakeWipe; 

pushWipe; 

slide Wipe; 

- fade; 
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A user agent should implement the default subtype of these transition effects. 

A user agent that implements the semantics of this module shall at least support transition effects for non-animated 
image media elements. For purposes of the Transition Effects modules, two media elements are considered overlapping 
when they occupy the same region. 

BasicTransitions module adds attributes 'transln' and 'transOut' to the media elements of the Media Objects modules, 
and value "transition" to the set of legal values for the 'fill' attribute of the media elements. It also adds transition 
element to the content model of the head element. 

8.2.5 Content Model 

This table shows the full content model and attributes of the 3GPP PSS SMIL profile. The attribute collections used are 
defined by SMIL Basic ([31], SMIL Host Language Conformance requirements, chapter 2.4). Changes to SMIL Basic 
are shown in bold. 

Table 1 : Content model for the 3GPP PSS SMIL profile 



Element 




Elements 


Attributes 


smil 


head, body 


COMMON-ATTRS, CONTCTRL-ATTRS, xmlns 


head 


layout, switch, meta, 
metadata, transition 


COMMON-ATTRS 


body 


TIMING-ELMS, 

MEDIA-ELMS, 

switch, a, prefetch 


COMMON-ATTRS 


layout 


root-layout, region 


COMMON-ATTRS, CONTCTRL-ATTRS, type 


root-layout 


EMPTY 


COMMON-ATTRS, backgroundColor, height, width, skip- 
content 


region 


EMPTY 


COMMON-ATTRS, backgroundColor, bottom, fit, height, left, 

right, showBackground, top, width, z-index, skip-content, 

regionName 


ref, animation, audio, img, 
video, text, textstream 


area 


COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, 

repeat, region, MEDIA-ATTRS, clipBegin(clip-begin), 

clipEnd(clip-end), alt, longDesc, readlndex, abstract, 

author, copyright, transln, transOut 


a 


MEDIA-ELMS 


COMMON-ATTRS, LINKING-ATTRS 


area 


EMPTY 


COMMON-ATTRS, LINKING-ATTRS, TIMING-ATTRS, repeat, 
shape, coords, nohref 


par, seq 


TIMING-ELMS, 

MEDIA-ELMS, 

switch, a, prefetch 


COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, 
repeat 


switch 


TIMING-ELMS, 

MEDIA-ELMS, layout, 

a, prefetch 


COMMON-ATTRS, CONTCTRL-ATTRS 


prefetch 


EMPTY 


COMMON-ATTRS, CONTCTRL-ATTRS, mediaSize, 
mediaTime, bandwidth, src, skip-content 


meta 


EMPTY 


COMMON-ATTRS, content, name, skip-content 


metadata 


EMPTY 


COMMON-ATTRS, skip-content 


transition 


EMPTY 


COMMON-ATTRS, CONTCTRL-ATTRS, type, subtype, 
startProgress, endProgress, direction, fadeColor. skip- 
content 



Interchange format for MMS 



9.1 



General 



The MPEG-4 file format [34] is mandated in [35] to be used for continuous media along the entire delivery chain 
envisaged by the MMS, independent on whether the final delivery is done by streaming or download, thus enhancing 
interoperability. 
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In particular, the following stages are considered: 

upload from the originating terminal to the MMS proxy; 

file exchange between MMS servers; 

transfer of the media content to the receiving terminal, either by file download or by streaming. In the first case 
the self-contained file is transferred, whereas in the second case the content is extracted from the file and 
streamed according to open payload formats. In this case, no trace of the file format remains in the content that 
goes on the wire/in the air. 

Additionally, the MPEG-4 file format should be used for the storage in the servers and the "hint track" mechanism may 
be used for the preparation for streaming. 

The clause 9.2 of the present document gives the necessary requirements to follow for the MPEG-4 file format used in 
MMS. These requirements will guarantee PSS to interwork with MMS as well as the MPEG-4 file format to be used 
internally within the MMS system. For PSS servers not interworking with MMS there is no requirement to follow these 
guidelines. 

9.2 File format guidelines 

NOTE: The file format used in this specification for timed multimedia (such as video, associated audio and timed 
text) is structurally based on the MP4 file format as defined in [34]. However, since non-ISO codecs are 
used here, it is called the 3GPP file format and has its own file extension and MIME type to distinguish 
these files from MPEG-4 files. When this specification refers to the MP4 file format, it is referring to its 
structure (ISO file format), not to its conformance definition. 

9.2.1 Registration of non-ISO codecs 

How to include the non-ISO code streams AMR narrow-band speech, AMR wideband speech, H.263 encoded video 
and timed text in MP4 files is described in annex D of the present document. 

9.2.2 Hint tracks 

The hint tracks are a mechanism that the server implementation may choose to use in preparation for the streaming of 
media content contained in MP4 files. However, it should be observed that the usage of the hint tracks is an internal 
implementation matter for the server, and it falls outside the scope of the present document. 

9.2.3 Self-contained MP4 files 

All media in the MP4 file shall be self-contained, i.e. there shall not be referencing to external media data from inside 
the MP4 file. 



9.2.4 MPEG-4 systems specific elements 

Tracks relative to MPEG-4 system architectural elements (e.g. BIFS scene description tracks or OD Object descriptors) 
are optional and shall be ignored. The adoption of the MPEG-4 file format does not imply the usage of MPEG-4 
systems architecture. The receiving terminal is not required to implement any of the specific MPEG-4 system 
architectural elements. 

9.2.5 Interpretation of MPEG-4 file format 

All index numbers used in MPEG-4 file format start with the value one rather than zero, in particular "first-chunk" in 
Sample to chunk atom, "sample-number" in Sync sample atom and "shadowed-sample-number", "sync-sample- 
number" in Shadow sync sample atom. 
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Annex A (informative): 
Protocols 

A.1 SDP 

This clause gives some background information on SDP for PSS clients. 

Table A. 1 provides an overview of the different SDP fields that can be identified in a SDP file. The order of SDP fields 
is mandated as specified in RFC 2327 [6]. 
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Table A.1 : Overview of fields in SDP for PSS clients 



Type 


Description 


Requirement 
according to [6] 


Requirement 

according to 

the present 

document 


Session Description 


V 


Protocol version 


R 


R 





Owner/creator and session identifier 


R 


R 


S 


Session Name 


R 


R 


I 


Session information 








U 


URI of description 








E 


Email address 








P 


Phone number 








C 


Connection Information 


R 


R 


B 


Bandwidth 
information 


AS 








One or more Time Descriptions (See below) 


Z 


Time zone adjustments 








K 


Encryption key 








A 


Session attributes 


control 





R 


range 





R 


One or more Media Descriptions (See below) 




Time Description 


T 


Time the session is active 


R 


R 


R 


Repeat times 










Media Description 


M 


Media name and transport address 


R 


R 


I 


Media title 








C 


Connection information 


R 


R 


B 


Bandwidth 
information 


AS 





R 


K 


Encryption Key 








A 


Attribute Lines 


control 





R 


range 





R 


fmtp 





R 


rtpmap 





R 


X-predecbufsize 


ND 





X-initpredecbufperiod 


ND 





X-initpostdecbufperiod 


ND 





X-decbyterate 


ND 





Note 1 : R = Required, = Optional, ND = Not Defined 

Note 2: The "c" type is only required on the session level if not present on the media level. 

Note 3: The "c" type is only required on the media level if not present on the session level. 

Note 4: According to RFC 2327, either an 'e' or 'p' field must be present in the SDP description. On the 
other hand, both fields will be made optional in the future release of SDP. So, for the sake 
of robustness and maximum interoperability, either an 'e' or 'p' field shall be present during 
the server's SDP file creation, but the client should also be ready to receive SDP content 
containing neither 'e' nor 'p' fields. 



The example below shows an SDP file that could be sent to a PSS client to initiate unicast streaming of a H.263 video 
sequence. 
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EXAMPLE: v=0 

o=ghost 2890844526 2890842807 IN IP4 192.168.10.10 

s=3GPP Unicast SDP Example 

i=Example of Unicast SDP file 

u=http://www.infoserver.com/ae600 

e=ghost@mailserver.com 

c=IN IP4 0.0.0.0 

t=0 

a=range: npt=0-45 .678 

m=video 1024 RTP/AVP 96 

b=AS:128 

a=rtpmap:96 H26 3 -2000/90000 

a=fmtp:96 profile=3;level=10 

a=control:rtsp://mediaserver.com/movie 

a=recvonly 



A.2 RTSP 



A.2.1 General 

Clause 5.3.2 of the present document defines the required RTSP support in PSS clients and servers by making 
references to Appendix D of [5]. The current clause gives an overview of the methods (see Table A.2) and headers (see 
Table A. 3) that are specified in the referenced Appendix D. An example of an RTSP session is also given. 

Table A.2: Overview of the required RTSP method support 



Method 


Requirement for a 

minimal on-demand 

playback client 

according to [5]. 


Requirement for a 

PSS client 

according to the 

present document. 


Requirement for a 

minimal on-demand 

playback server 

according to [5]. 


Requirement for a 

PSS server 

according to the 

present document. 


OPTIONS 


O 


O 


Respond 


Respond 


REDIRECT 


Respond 


Respond 


O 


O 


DESCRIBE 


O 


Generate 


O 


Respond 


SETUP 


Generate 


Generate 


Respond 


Respond 


PLAY 


Generate 


Generate 


Respond 


Respond 


PAUSE 


Generate 


Generate 


Respond 


Respond 


TEARDOWN 


Generate 


Generate 


Respond 


Respond 


NOTE 1 : O = Support is optional 

NOTE 2: 'Generate' means that the client/server is required to generate the request where applicable. 

NOTE 3: 'Respond' means that the client/server is required to properly respond to the request. 



ETSI 



3GPP TS 26.234 version 5.2.0 Release 5 



35 



ETSI TS 126 234 V5.2.0 (2002-09) 



Table A.3: Overview of the required RTSP header support 



Header 


Requirement for a 

minimal on-demand 

playback client 

according to [5]. 


Requirement for a 

PSS client 

according to the 

present document. 


Requirement for a 

minimal on-demand 

playback server 

according to [5]. 


Requirement for a 

PSS server 

according to the 

present document. 


Connection 


include/understand 


include/understand 


include/understand 


include/understand 


Content-Encoding 


understand 


understand 


include 


include 


Content-Language 


understand 


understand 


include 


include 


Content-Length 


understand 


understand 


include 


include 


Content-Type 


understand 


understand 


include 


include 


CSeq 


include/understand 


include/understand 


include/understand 


include/understand 


Location 


understand 


understand 








Public 








include 


include 


Range 





include/understand 


understand 


include/understand 


Require 








understand 


understand 


RTP-lnfo 


understand 


understand 


include 


include 


Session 


include 


include 


understand 


understand 


Timestamp 








include/understand 


include/understand 


Transport 


include/understand 


include/understand 


include/understand 


include/understand 


User-Agent 4 














NOTE 1 : = Support is optional 

NOTE 2: 'include' means that the client/server is required to include the header in a request or response where 

applicable. 
NOTE 3: 'understand' means that the client/server is required to be able to respond properly if the header is received in 

a request or response. 
NOTE 4: According to [5] the "User-Agent" header is not strictly required for a minimal RTSP client implementation, 

although it is highly recommended that it is included with requests. The same applies to a PSS client 

according to the present document. 



The example below is intended to give some more understanding of how RTSP and SDP are used within the 3GPP PSS. 
The example assumes that the streaming client has the RTSP URL to a presentation consisting of an H.263 video 
sequence and AMR speech. RTSP messages sent from the client to the server are in bold and messages from the server 
to the client in italic. In the example the server provides aggregate control of the two streams. 



EXAMPLE: 



DESCRIBE rtsp://mediaserver.com/movie.test RTSP/1.0 

CSeq: 1 

User-Agent: TheStreamClient/l.lb2 

RTSP/1.0 200 OK 

CSeq: 1 

Content-Type: application/sdp 

Content-Length: 435 

v=0 

o=- 950814089 950814089 IN IP4 144.132.134.67 

s=Example of aggregate control of AMR speech and H.263 video 

e =foo @ bar. com 

c=IN IP4 0.0.0.0 

b=AS:77 

t=0 

a=range:npt=0-59.3478 

a=control:* 

m=audio RTP/AVP 97 

b=AS:13 

a=rtpmap:97 AMR/8000 

a=fmtp:97 

a=maxptime:200 

a=control : streamID=0 

m=video RTP/AVP 98 
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b=AS:64 

a=rtpmap:98 H263-2000/90000 
a=fmtp:98 profile=3;level=10 
a=control: streamID=l 



SETUP rtsp://mediaserver.com/movie.test/streamID=0 RTSP/1.0 
CSeq: 2 

Transport: RTP/AVP/UDP;unicast;client_port=3456-3457 
User-Agent: TheStreamClient/l.lb2 



RTSP/1.0 200 OK 

CSeq: 2 

Transport: RTP/AVP/UDP;unicast;client_port=3456-3457; server_port=5678-5679 

Session: dfhyrio90llk 



SETUP rtsp://mediaserver.com/movie.test/streamID=l RTSP/1.0 

CSeq: 3 

Transport: RTP/AVP/UDP;unicast;client_port=3458-3459 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 



RTSP/1.0 200 OK 

CSeq: 3 

Transport: RTP/AVP/UDP;unicast;client_port=3458-3459; server j>ort=5680-5681 

Session: dfhyrio90Uk 



PLAY rtsp://mediaserver.coni/movie.test RTSP/1.0 

CSeq: 4 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 



RTSP/1.0 200 OK 
CSeq: 4 

Session: dfhyrio90llk 
Range: npt=0- 

RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0; seq=9900;rtptime=4470048, 
url= rtsp.V/mediaserver. com/movie. test/streamID =1; seq = 1 004; rtptime = 1 070549 

NOTE: Headers can be folded onto multiple lines if the continuation line begins with a space or 
horizontal tab. For more information, see RFC2616 [17]. 

The user watches the movie for 20 seconds and then decides to fast forward to 10 seconds before 
the end... 

PAUSE rtsp://mediaserver.coni/movie.test RTSP/1.0 

CSeq: 5 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 



PLAY rtsp://mediaserver.com/movie.test RTSP/1.0 

CSeq: 6 

Range: npt=50-59.3478 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 
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RTSP/1.0 200OK 

CSeq: 5 

Session: dfhyrio90llk 

RTSP/1.0 200OK 

CSeq: 6 

Session: dfhyrio90llk 

Range: npt=50-59.3478 

RTF -Info: url= rtsp://mediaserver.com/movie.test/streamID=0; 

seq=39900;rtptime=44470648, 

url= rtsp://mediaserver.com/movie.test/streamID=l; 

seq=31004;rtptime=41090349 



After the movie is over the client issues a TEARDOWN to end the session. 



TEARDOWN rtsp://mediaserver.com/movie.test RTSP/1.0 

CSeq: 7 

Session: dfhyrio9011k 

User-Agent: TheStreamClient/l.lb2 

RTSP/1.0 200 OK 

Cseq: 7 

Session: dfhyrio90llk 

Connection: close 

A.2.2 Implementation guidelines 
A.2.2.1 Usage of persistent TCP 

Considering the potentially long round-trip-delays in a packet switched streaming service over UMTS it is important to 
keep the number of messages exchanged between a server and a client low. The number of requests and responses 
exchanged is one of the factors that will determine how long it takes from the time that a user initiates PSS until the 
streams starts playing in a client. 

RTSP methods are sent over either TCP or UDP for IP. Both client and server shall support RTSP over TCP whereas 
RTSP over UDP is optional. For TCP the connection can be persistent or non-persistent. A persistent connection is used 
for several RTSP request/response pairs whereas one connection is used per RTSP request/response pair for the non- 
persistent connection. In the non-persistent case each connection will start with the three-way handshake (SYN, ACK, 
SYN) before the RTSP request can be sent. This will increase the time for the message to be sent by one round trip 
delay. 

For these reasons it is recommended that 3GPP PSS clients should use a persistent TCP connection, at least for the 
initial RTSP methods until media starts streaming. 

A.2.2. 2 Detecting link aliveness 

In the wireless environment, connection may be lost due to fading, shadowing, loss of battery power, or turning off the 
terminal even though the PSS session is active. In order for the server to be able to detect the client's aliveness, the PSS 
client should send "wellness" information to the PSS server for a defined interval as described in the RFC2326. There 
are several ways for detecting link aliveness described in the RFC2326, however, the client should be careful about 
issuing "PLAY method without Range header field" too close to the end of the streams, because it may conflict with 
pipelined PLAY requests. Below is the list of recommended "wellness" information for the PSS clients and servers in a 
prioritised order. 

1. RTCP 
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2. OPTIONS method with Session header field 

NOTE: Both servers and clients can initiate this OPTIONS method. 

The client should send the same wellness information in 'Ready' state as in 'Playing' and 'Recording' states, and the 
server should detect the same client's wellness information in 'Ready' state as in 'Playing' and 'Recording' states. In 
particular, the same link aliveness mechanism should be managed following a 'PAUSE' request and response. 

A^3 RTP 
A.3.1 General 

Void. 

A.3.2 Implementation guidelines 
A.3.2.1 Maximum RTP packet size 

The RFC 1889 (RTP) [9] does not impose a maximum size on RTP packets. However, when RTP packets are sent over 
the radio link of a 3GPP PSS system there is an advantage in limiting the maximum size of RTP packets. 

Two types of bearers can be envisioned for streaming using either acknowledged mode (AM) or unacknowledged mode 
(UM) RLC. The AM uses retransmissions over the radio link whereas the UM does not. In UM mode large RTP packets 
are more susceptible to losses over the radio link compared to small RTP packets since the loss of a segment may result 
in the loss of the whole packet. On the other hand in AM mode large RTP packets will result in larger delay jitter 
compared to small packets as there is a larger chance that more segments have to be retransmitted. 

For these reasons it is recommended that the maximum size of RTP packets should be limited in size taking into 
account the wireless link. This will decrease the RTP packet loss rate particularly for RLC in UM. For RLC in AM the 
delay jitter will be reduced permitting the client to use a smaller receiving buffer. It should also be noted that too small 
RTP packets could result in too much overhead if IP/UDP/RTP header compression is not applied or unnecessary load 
at the streaming server. 

In the case of transporting video in the payload of RTP packets it may be that a video frame is split into more than one 
RTP packet in order not to produce too large RTP packets. Then, to be able to decode packets following a lost packet in 
the same video frame, it is recommended that synchronisation information be inserted at the start of such RTP packets. 
For H.263 this implies the use of GOBs with non-empty GOB headers and in the case of MPEG-4 video the use of 
video packets (resynchronisation markers). If the optional Slice Structured mode (Annex K) of H.263 is in use, GOBs 
are replaced by slices. 

A.3.2. 2 Sequence number and timestamp in the presence of NPT jump 

The description below is intended to give more understanding of how RTP sequence number and timestamp are 
specified within the 3GPP PSS in the presence of NPT jumps. The jump happens when a client sends a PLAY request 
to skip media. 

The RFC 2326 (RTSP) [5] specifies that both RTP sequence numbers and RTP timestamps must be continuous and 
monotonic across jumps of NPT. Thus when a server receives a request for a skip of the media that causes a jump of 
NPT, it shall specify RTP sequence numbers and RTP timestamps continuously and monotonically across the skip of 
the media to conform to the RTSP specification. Also, the server may respond with "seq" in the RTP -Info field if this 
parameter is known at the time of issuing the response. 

A.3.2. 3 RTCP transmission interval 

In RTP [9], Section 6.2, rules for the calculation of the interval between the sending of two consecutive RTCP packets, 
i.e. the RTCP transmission interval, are defined. These rules consist of two steps: 
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Step 1 : an algorithm that calculates a transmission interval from parameters such as the session bit rate and the 
average RTCP packet size. This algorithm is described in [9], annex A.7. 

Step 2: Taking the maximum of the transmission interval computed in step 1 and a mandatory fixed minimum 
RTCP transmission interval of 5 seconds. 

Implementations conforming to this TS shall perform step 1 and may perform step 2. All other algorithms and rules of 
[9] stay valid and shall be followed 

Following these recommendations results in regular sending of RTCP messages, where the interval between those is 
depending on the session bandwidth and the RTCP packet size. 



A.4 Capability exchange 
A.4.1 Overview 

Clause A.4 provides detailed information about the structure and exchange of device capability descriptions for the 
PSS. It complements the normative part contained in clause 5.2 of the present document. 

The functionality is sometimes referred to as capability exchange. Capability exchange in PSS uses the CC/PP [39] 
framework and reuse parts of the CC/PP application UAProf [40]. 

To facilitate server-side content negotiation for streaming, the PSS server needs to have access to a description of the 
specific capabilities of the mobile terminal, i.e. the device capability description. The device capability description 
contains a number of attributes. During the set-up of a streaming session the PSS server can use the description to 
provide the mobile terminal with the correct type of multimedia content. Concretely, it is envisaged that servers use 
information about the capabilities of the mobile terminal to decide which stream(s) to provision to the connecting 
terminal. For instance, the server could compare the requirements on the mobile terminal for multiple available variants 
of a stream with the actual capabilities of the connecting terminal to determine the best-suited stream(s) for that 
particular terminal. A similar mechanism could also be used for other types of content. 

A device capability description contains a number of device capability attributes. In the present document they are 
referred to as just attributes. The current version of PSS does not include a definition of any specific user preference 
attributes. Therefore we use the term device capability description. However, it should be noted that even though no 
specific user preference attributes are included, simple tailoring to the preferences of the user could be achieved by 
temporarily overrides of the available attributes. E.g. if the user for a particular session only would like to receive mono 
sound even though the terminal is capable of stereo, this can be accomplished by providing an override for the 
"AudioChannels" attribute. It should also be noted that the extension mechanism defined would enable an easy 
introduction of specific user preference attributes in the device capability description if needed. 

The term device capability profile or profile is sometimes used instead of device capability description to describe a 
description of device capabilities and/or user preferences. The three terms are used interchangeably in the present 
document. 

Figure A.l illustrates how capability exchange in PSS is performed. In the simplest case the mobile terminal informs 
the PSS server(s) about its identity so that the latter can retrieve the correct device capability profile(s) from the device 
profile server(s). For this purpose, the mobile terminal adds one or several URLs to RTSP and/or HTTP protocol data 
units that it sends to the PSS server(s). These URLs point to locations on one or several device profile servers from 
where the PSS server should retrieve the device capability profiles. This list of URLs is encapsulated in RTSP and 
HTTP protocol data units using additional header field(s). The list of URLs is denoted URLdesc. The mobile terminal 
may supplementthe URLdesc with extra attributes or overrides for attributes already defined in the profile(s) located at 
URLdesc. This information is denoted Profdiff. As URLdesc, Profdiff is encapsulated in RTSP and HTTP protocol data 
units using additional header field(s). 

The device profile server in Figure A.l is the logical entity that stores the device capability profiles. The profile needed 
for a certain request from a mobile terminal may be stored on one or several such servers. A terminal manufacturer or a 
software vendor could maintain a device profile server to provide device capability profiles for its products. It would 
also be possible for an operator to manage a device profile server for its subscribers and then e.g. enable the subscriber 
to make user specific updates to the profiles. The device profile server provides device capability profiles to the PSS 
server on request. 
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Figure A.1 : Functional components in PSS capability exchange 

The PSS server is the logical entity that provides multimedia streams and other, static content (e.g. SMIL documents, 
images, and graphics) to the mobile terminal (see Figure A. 1). A PSS application might involve multiple PSS servers, 
e.g. separate servers for multimedia streams and for static content. A PSS server handles the matching process. 
Matching is a process that takes place in the PSS servers (see Figure A. 1). The device capability profile is compared 
with the content descriptions at the server and the best fit is delivered to the client. 



A.4.2 Scope of the specification 



The following bullet list describes what is considered to be within the scope of the specification for capability exchange 
in PSS. 

Definition of the structure for the device capability profiles, see clause A.4.3. 

Definition of the CC/PP vocabularies, see clause A.4.4. 

Reference to a set of device capability attributes for multimedia content retrieval applications that have 
already been defined by UAProf [40]. The purpose of this reference is to point out which attributes are useful 
for the PSS application. 

Definition of a set of device capability attributes specifically for PSS applications that are missing in UAProf. 



It is important to define an extension mechanism to easily add attributes since it is not possible to cover all 
attributes from the beginning. The extension mechanism is described in clause A.4.5. 

The structure of URLdesc, Profdiff and their interchange is described in clause A.4.6. 

Protocols for the interchange of device capability profiles between the PSS server and the device profile server is 
defined in clause 5.2.7. 

The specification does not include: 
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rules for the matching process on the PSS server. These mechanisms should be left to the implementations. For 
interoperability, only the format of the device capability description and its interchange is relevant. 

definition of specific user preference attributes. It is very difficult to standardise such attributes since they are 
dependent on the type of personalised services one would like to offer the user. The extensible descriptions 
format and exchange mechanism proposed in this document provide the means to create and exchange such 
attributes if needed in the future. However, as explained in clause A.4. 1 limited tailoring to the preferences of the 
user could be achieved by temporarily overridingavailable attributes in the vocabularies already defined for PSS. 
The vocabulary also includes some very basic user preference attributes. For example, the profile includes a list 
of preferred languages. Also the list of MIME types can be interpreted as user preference, e.g. leaving out audio 
MIME's could mean that user does not want to receive any audio content. The available attributes are described 
in clause 5.2.3 of the present document. 

requirements for caching of device capability profiles on the PSS server. In UAProf, a content server can cache 
the current device capability profile for a given WSP session. This feature relies on the presence of WSP 
sessions. Caching significantly increases the complexity of both the implementations of the mobile terminal and 
the server. However, HTTP is used between the PSS server and the device profile server. For this exchange, 
normal content caching provisions as defined by HTTP apply and the PSS server may utilise this to speed up the 
session set-up (see clause 5.2.7) 

intermediate proxies. This feature is considered not relevant in the context of PSS applications. 

A.4.3 The device capability profile structure 

A device capability profile is a description of the capabilities of the device and possibly also the preferences of the user 
of that device. It can be used to guide the adaptation of content presented to the device. A device capability profile for 
PSS is a RDF [41] document that follows the structure of the CC/PP framework [39] and the CC/PP application UAProf 
[40]. The terminology of CC/PP is used in this text and therefore briefly described here. 

Attributes are used for specifying the device capabilities and user preferences. A set of attribute names, permissible 
values and semantics constitute a CC/PP vocabulary. A RDF schema defines a vocabulary. The syntax of the attributes 
is defined in the schema but also, to some extent, the semantics. A profile is an instance of a schema and contains one or 
more attributes from the vocabulary. Attributes in a schema are divided into components distinguished by attribute 
characteristics. In the CC/PP specification it is anticipated that different applications will use different vocabularies. 
According to the CC/PP framework a hypothetical profile might look like Figure A.2. A further illustration of how a 
profile might look like is given in the example in clause A.4.7. 



[MyPhone] 



-ccpp:component ►[Terminal Hardware] 



-rdf:type ► [prf:HardwarePlatform] 

-pitColorCapable — ►"Yes" 
-pitBitsPerPixel ►"4" 



-ccpp:component ►[Streaming] 



-rdf:type ► [pss:Streaming] 

-pss:PssVersion ►"3GPP-R5" 



Figure A.2: Illustration of the profile structure 

A CC/PP schema is extended through the introduction of new attribute vocabularies and a device capability profile can 
use attributes drawn from an arbitrary number of different vocabularies. Each vocabulary is associated with a unique 
XML namespace. This mechanism makes it possible to reuse attributes from other vocabularies. It should be mentioned 
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that the prefix ccpp identifies elements of the CCPP namespace (URI http://www.w3.org/1999/02/22-rdf-syntax-ns), 
prf identifies elements of the UAProf namespace (URI http://www.wapforum.org/profiles/UAPROF/ccppschema- 
20010330) , rdf identifies elements of the RDF namespace (URI http://www.w3.org/1999/02/22-rdf-syntax-ns ) and pss 
identifies elements of the Streaming namespace. (URI http://www.3gpp.org/profiles/PSS/ccppschema-PSS5). 

Attributes of a component can be included directly or may be specified by a reference to a CC/PP default profile. 
Resolving a profile that includes a reference to a default profile is time-consuming. When the PSS server receives the 
profile from a device profile server the final attribute values can not be determined until the default profile has been 
requested and received. Support for defaults is required by the CC/PP specification [39]. Due to these problems, there is 
a recommendation made in clause 5.2.6 to not use the CC/PP defaults element in PSS device capability profile 
documents. 

A.4.4 CC/PP Vocabularies 

A CC/PP vocabulary shall according to CC/PP and UAProf include: 

A RDF schema for the vocabulary based on the CC/PP schema. 

A description of the semantics/type/resolution rules/sample values for each attribute. 

A unique namespace shall be assigned to each version of the profile schema. 
Additional information that could be included in the profile schema: 

A description about the profile schema, i.e. the purpose of the profile, how to use it, when to use it etc. 

A description of extensibility,!. e. how to handle future extensions of the profile schema. 

A device capability profile can use an arbitrary number of vocabularies and thus it is possible to reuse attributes from 
other vocabularies by simply referencing the corresponding namespaces. The focus of the PSS vocabulary is content 
formatting which overlaps the focus of the UAProf vocabulary. UAProf is specified by WAP Forum and is an 
architecture and vocabulary/schema for capability exchange in the WAP environment. Since there are attributes in the 
UAProf vocabulary suitable for streaming applications these are reused and combined with a PSS application specific 
streaming component. This makes the PSS vocabulary an extension vocabulary to UAProf. The CC/PP specification 
encourages reuse of attributes from other vocabularies. To avoid confusion, the same attribute name should not be used 
in different vocabularies. In clause 5.2.3.3 a number of attributes from UAProf [40] are recommended for PSS. The 
PSS base vocabulary is defined in clause 5.2.3.2. 

A profile is allowed to instantiate a subset of the attributes in the vocabularies and no specific attributes are required but 
insufficient description may lead to content unable to be shown by the client. 

A.4.5 Principles of extending a schema/vocabulary 

The use of RDF enables an extensibility mechanism for CC/PP-based schemas that addresses the evolution of new types 
of devices and applications. The PSS profile schema specification is going to provide a base vocabulary but in the 
future new usage scenarios might have need for expressing new attributes. This is the reason why there is a need to 
specify how extensions of the schema will be handled. If the TSG responsible for the present document updates the base 
vocabulary schema a new unique namespace will be assigned to the updated schema. In another scenario the TSG may 
decide to add a new component containing specific user related attributes. This new component will be assigned a new 
namespace and it will not influence the base vocabulary in any way. If other organisations or companies make 
extensions this can be either as a new component or as attributes added to the existing base vocabulary component 
where the new attributes uses a new namespace. This ensures that third parties can define and maintain their own 
vocabularies independently from the PSS base vocabulary. 

A.4.6 Signalling of profile information between client and server 

URLdesc and Profdiff were introduced in clause A.4. 1 . The URLdesc is a list of URLs that point to locations on device 
profile servers from where the PSS server retrieves suitable device capability profiles. The Profdiff contains additional 
capability description information; e.g. overrides for certain attribute values. Both URLdesc and Profdiff are 
encapsulated in RTSP and HTTP messages using additional header fields. This can be seen in Figure A.l. In clause 9.1 
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of [40] three new HTTP headers are defined that can be used to implement the desired functionality: "x-wap-profile", 
"x-wap-profile-diff" and "x-wap-profile-warning". These headers are reused in PSS for both HTTP and RTSP. 

The "x-wap-profile" is a request header that contains a list of absolute URLs to device capability descriptions 
and profile diff names. The profile diff names correspond to additional profile information in the "x-wap-profile- 
diff header. 

The "x-wap-profile-diff" is a request header that contains a subset of a device capability profile. 

The "x-wap-profile-warning" is a response header that contains error codes explaining to what extent the server 
has been able to match the terminal request. 

Clause 5.2.5 of the present document defines this exchange mechanism. 

It is left to the mobile terminal to decide when to send x-wap-profile headers. The mobile terminal could send the "x- 
wap-profile" and "x-wap-profile-diff ' headers with each RTSP DESCRIBE and/or with each RTSP SETUP request. 
Sending them in the RTSP DESCRIBE request is useful for the PSS server to be able to make a better decision which 
presentation description to provision to the client. Sending the "x-wap-profile" and "x-wap-profile-diff" headers with an 
HTTP request is useful whenever the mobile terminal requests some multimedia content that will be used in the PSS 
application. For example it can be sent with the request for a SMIL file and the PSS server can see to it that the mobile 
terminal receives a SMIL file which is optimised for the particular terminal. Clause 5.2.5 of the present document gives 
recommendations for when profile information should be sent. 

It is up to the PSS server to retrieve the device capability profiles using the URLs in the "x-wap-profile" header. The 
PSS server is also responsible to merge the profiles then received. If the "x-wap-profile-diff" header is present it must 
also merge that information with the retrieved profiles. This functionality is defined in clause 5.2.6. 

It should be noted that it is up the implementation of the mobile terminal what URLs to send in the "x-wap-profile" 
header. For instance, a terminal could just send one URL that points to a complete description of its capabilities. 
Another terminal might provide one URL that points to a description of the terminal hardware. A second URL that 
points to a description of a particular software version of the streaming application, and a third URL that points to the 
description of a hardware or software plug-in that is currently added to the standard configuration of that terminal. From 
this example it becomes clear that sending URLs from the mobile terminal to the server is good enough not only for 
static profiles but that it can also handle re-configurations of the mobile terminal such as software version changes, 
software plug-ins, hardware upgrades, etc. 

As described above the list of URLs in the x-wap-profile header is a powerful tool to handle dynamic changes of the 
mobile terminal. The "x-wap-profile-diff" header could also be used to facilitate the same functionality. To use the "x- 
wap-profile-diff ' header to e.g. send a complete profile (no URL present at all in the "x-wap-profile header") or updates 
as a result of e.g. a hardware plug-in is not recommended unless some compression scheme is applied over the air- 
interface. The reason is of course that the size of a profile may be large. 

A.4.7 Example of a PSS device capability description 

The following is an example of a device capability profile as it could be available from a device profile server. The 
XML document includes the description of the imaginary "Phone007" phone. 

Instead of a single XML document the description could also be spread over several files. The PSS server would need to 
retrieve these profiles separately in this case and would need to merge them. For instance, this would be useful when 
device capabilities of this phone that are related to streaming would differ among different versions of the phone. In this 
case the part of the profile for streaming would be separated from the rest into its own profile document. This separation 
allows describing the difference in streaming capabilities by providing multiple versions of the profile document for the 
streaming capabilities. 

<?xml version=" 1 . " ?> 

<rdf :RDF xmlns : rdf ="http : //www. w3 . org/ 1999/02 /22-rdf-syntax-ns " 
xmlns : ccpp="http : //www. w3 . org/2 00 0/ 07 /04-ccpp" 

xmlns : prf ="http : //www. wapf orum. org/prof iles/UAPROF/ccppschema-200 10330 " 
xmlns :pss5="http: //www. 3gpp . org/prof iles /PSS /ccppschema-PSS5 "> 

<rdf : Description rdf : about ="http : //www. bar . com/Phones/Phone007 "> 

<ccpp : component> 

<rdf: Description ID="HardwarePlatf orm"> 
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<rdf : type rdf : resource="http : //www. wapf orum. org/prof iles/UAPROF/ccppschema- 
20010330#HardwarePlatform" /> 

<prf:BitsPerPixel>4</prf:BitsPerPixel> 

<prf : Co lo r Capable > Ye s</prf: Co lor Capable > 

<prf :PixelAspectRatio>lx2</prf :PixelAspectRatio> 

<prf : Point ingResolution>Pixel</prf : Point ingResolution> 

<prf :Model>Phone0 07</prf :Model> 
<prf : Vendor >Ericsson</prf : Vendor > 
</rdf : Description> 
</ccpp: component > 

<ccpp : component> 

<rdf: Description ID="SoftwarePlatf orm"> 

<rdf : type rdf: resource="http: //www. wapf orum. org/prof iles/UAPROF / ccpp schema - 
20010330#SoftwarePlatform" /> 

<prf : CcppAccept-Charset> 
<rdf :Bag> 

<rdf :li>UTF-8</rdf :li> 
<rdf :li>ISO-10 64 6-UCS-2</rdf :li> 
</rdf :Bag> 
</prf : CcppAccept-Charset> 
<prf : CcppAccept-Encoding> 
<rdf :Bag> 

<rdf : li>base64</rdf : li> 
<rdf : li>quoted-printable</rdf : li> 
</rdf :Bag> 
</prf : CcppAccept-Encoding> 
<prf : CcppAccept-Language> 
<rdf : Seq> 

<rdf : li>en</rdf : li> 
<rdf : li>se</rdf : li> 

</rdf : Seq> 
</prf : CcppAccept-Language> 
</rdf : Description> 
</ccpp: component > 

<ccpp : component> 

<rdf: Description ID=" St reaming "> 

<rdf:type rdf : resource=" http://www.3gpp.Org/profiles/PSS/ccppschema-PSS5#Streaming" /> 
<pss5 : AudioChannels>Stereo</pss5 : AudioChannels> 

<pss5 : VideoPreDecoderBuf ferSize>3 072 0</pss5 : VideoPreDecoderBuf ferSize> 

<pss5 : Video InitialPostDecoderBuf feringPeriod>0</pss5 : VideoInitialPostDecoderBuf feringPeriod> 
<pss5 : VideoDecodingByteRate>16000</pss5 : VideoDecodingByteRate> 
<pss5 : RenderingScreenSize>7 3x5 0</pss5 : Render ingScreenSize> 
<pss5 :PssAccept> 
<rdf :Bag> 

<rdf : li>audio/AMR-WB;octet-alignment</rdf : li> 
<rdf : li>video/MP4V-ES</rdf : li> 
</rdf :Bag> 
</pss5 :PssAccept> 
<pss5 :PssAccept-Subset> 
<rdf :Bag> 

<rdf : li>JPEG-PSS</rdf : li> 
</rdf :Bag> 
</pss5 : PssAccept-Subset> 

<pss5 : PssVersion>3GPP-R5</pss5 : PssVersion> 

<pss5 : RenderingScreenSize>7 0x4 0</pss5 : Render ingScreenSize> 
<pss5 : SmilBaseSet>SMIL-3GPP-R4</pss5 : SmilBaseSet> 
<pss5 : SmilModules> 
<rdf :Bag> 

<rdf: li>BasicTransitions</rdf: li> 
<rdf : li>MulitArcTiming</rdf : li> 
</rdf :Bag> 
</pss5 : SmilModules> 
</rdf : Description> 
</ccpp: component > 

</rdf : Description> 
</rdf :RDF> 
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Annex B (informative): 
SMIL authoring guidelines 

B.1 General 

This is an informative annex for SMIL presentation authors. Authors can expect that PSS clients can handle the SMIL 
module collection defined in clause 8.2, with the restrictions defined in this Annex. When creating SMIL documents the 
author is recommended to consider that terminals may have small displays and simple input devices. The media types 
and their encoding included in the presentation should be restricted to what is described in clause 7 of the present 
document. Considering that many mobile devices may have limited software and hardware capabilities, the number of 
media to be played simultaneous should be limited. For example, many devices will not be able to handle more than one 
video sequence at the time. 



B.2 BasicLinking 



The Linking Modules define elements and attributes for navigational hyperlinking, either through user interaction or 
through temporal events. The BasicLinking module defines the "a" and "area" elements for basic linking: 

a Similar to the "a" element in HTML it provides a link from a media object through the href attribute (which 

contains the URI of the link's destination). The "a" element includes a number of attributes for defining the 
behaviour of the presentation when the link is followed. 

area Whereas the a element only allows a link to be associated with a complete media object, the area element 
allows links to be associated with spatial and/or temporal portions of a media object. 

The area element may be useful for enabling services that rely on interactivity where the display size is not big enough 
to allow the display of links alongside a media (e.g. QCIF video) window. Instead, the user could, for example, click on 
a watermark logo displayed in the video window to visit the company website. 

Even if the area element may be useful some mobile terminals will not be able to handle area elements that include 
multiple selectable regions within an area element. One reason for this could be that the terminals do not have the 
appropriate user interface. Such area elements should therefore be avoided. Instead it is recommended that the "a" 
element be used. If the "area" element is used, the SMIL presentation should also include alternative links to navigate 
through the presentation; i.e. the author should not create presentations that rely on that the player can handle "area" 
elements. 



B.3 BasicLayout 

The "fit" attribute defines how different media should be fitted into their respective display regions. 

The rendering and layout of some objects on a small display might be difficult and all mobile devices may not support 
features such as scroll bars; in addition, the root-layout window may represent the full screen of the display. Therefore 
"fit=scroll" should not be used. 

Due to hardware restrictions in mobile devices, operations such that scaling of a video sequence, or even images, may 
be very difficult to achieve. According to the SMIL 2.0 specification SMIL players may in these situations clip the 
content instead. To be sure of that the presentation is displayed as the author intended, content should be encoded in a 
size suitable for the targeted terminals and it is recommended to use "fit=hidden". 
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B.4 EventTiming 



The two attributes "endEvent" and "repeatEvent" in the EventTiming module may cause problems for a mobile SMIL 
player. The end of a media element triggers the "endEvent". In the same way the "repeatEvent" occurs when the second 
and subsequent iterations of a repeated element begin playback. Both these events rely on that the SMIL player receives 
information about that the media element has ended. One example could be when the end of a video sequence initiates 
the event. If the player has not received explicit information about the duration of the video sequence, e.g. by the "dur" 
attribute in SMIL or by some external source as the "a=range" field in SDP. The player will have to rely on the RTCP 
BYE message to decide when the video sequence ends. If the RTCP BYE message is lost, the player will have problems 
initiate the event. For these reasons is recommended that the "endEvent" and "repeatEvent" attributes are used with 
care, and if used the player should be provided with some additional information about the duration of the media 
element that triggers the event. This additional information could e.g. be the "dur" attribute in SMIL or the "a=range" 
field in SDP. 

The "inBoundsEvent" and "outOfBoundsEvent" attributes assume that the terminal has a pointer device for moving the 
focus to within a window (i.e. clicking within a window). Not all terminals will support this functionality since they do 
not have the appropriate user interface. Hence care should be taken in using these particular event triggers. 

B.5 Metal nformation 

Authors are encouraged to make use of meta data whenever providing such information to the mobile terminal appears 
to be useful. However, they should keep in mind that some mobile terminals will parse but not process the meta data. 

Furthermore, authors should keep in mind that excessive use of meta data will substantially increase the file size of the 
SMIL presentation that needs to be transferred to the mobile terminal. This may result in longer set-up times. 

B.6 XML entities 

Entities are a mechanism to insert XML fragments inside an XML document. Entities can be internal, essentially a 
macro expansion, or external. Use of XML entities in SMIL presentations is not recommended, as many current XML 
parsers do not fully support them. 

B.7 XHTML Mobile Profile 

When rendering texts in a SMIL presentation, authors are able to use XHTML Mobile Profile [47] that contains thirteen 
modules. However, some of the modules include non-text information. When referring to an XHTML Mobile Profile 
document from a SMIL document, authors should use only the required XHTML Host Language modules : Structure 
Module, Text Module, Hypertext Module and List Module. The use of the Image Module, in particular, should not be 
used. Images and other non-text contents should be included in the SMIL document. 

NOTE: An XHTML file including a module which is not part of the XHTML Host Language modules may not 
be shown as intended. Also, an XHTML file which uses elements or attributes from the required 
XHTML Host Language modules and which uses elements or attributes that are not included in XHTML 
Basic Profile [28], may not render correctly on legacy handsets which implement only XHTML Basic. 
These are: 

The start attribute on the 'ol' element in the List module 

The value attribute on the 'li' element in the List module 

The 'b' element in the Presentation module 

The 'big' element in the Presentation module 

The 'hr' element in the Presentation module 

The 'i' element in the Presentation module 

The 'small' element in the Presentation module 
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Annex C (normative): 
MIME media types 

C.1 MIME media type H263-2000 

MIME media type name: video 
MIME subtype name: H263-2000 

Required parameters: None 

Optional parameters: 

profile: H.263 profile number, in the range through 8, specifying the supported H.263 annexe s/subp arts. 

level: Level of bitstream operation, in the range through 99, specifying the level of computational complexity of the 

decoding process. When no profile and level parameters are specified, Baseline Profile (Profile 0) level 10 are the 

default values. 

The profile and level specifications can be found in [23]. Note that the RTP payload format for H263-2000 is the same 
as for H263-1998 and is defined in [14], but additional annexes/subparts are specified along with the profiles and levels. 

NOTE: The above text will be replaced with a reference to the RFC describing the H263-2000 MIME media type 
as soon as this becomes available. 



C.2 MIME media type sp-midi 



MIME media type name: audio 
MIME subtype name: sp-midi 

Required parameters: none 

Optional parameters: none 

NOTE: The above text will be replaced with a reference to the RFC describing the sp-midi MIME media type as 
soon as this becomes available. 
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Annex D (normative): 

Support for non-ISO code streams in MP4 files 



D.1 General 



The purpose of this annex is to define the necessary structure for integration of the H.263, AMR and AMR-WB media 
specific information in an MP4 file. Clauses D.2 to D.4 give some background information about the Sample 
Description atom, VisualSampleEntry atom and the AudioSampleEntry atom in the MPEG-4 file format. Then, the 
definitions of the SampleEntry atoms for AMR, AMR-WB and H.263 are given in clauses D.5 to D.8. 

AMR and AMR-WB data is stored in the stream according to the AMR and AMR-WB storage format for single 
channel header of Annex E [11], without the AMR magic numbers. 



D.2 Sample Description atom 



In an MP4 file, Sample Description Atom gives detailed information about the coding type used, and any initialisation 
information needed for that coding. The Sample Description Atom can be found in the MP4 Atom Structure Hierarchy 
shown in figure D.l. 



Movie Atom 



Track Atom 



Media Atom 



Media Information Atom 



Sample Table Atom 



Sample Description Atom 



Figure D.1 : MP4 Atom Structure Hierarchy 

The Sample Description Atom can have one or more SampleDescriptionEntry fields. Valid Sample Description Entry 
atoms already defined for MP4 are AudioSampleEntry, VisualSampleEntry, HintSampleEntry and MPEGSampleEntry 
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Atoms. The SampleDescriptionEntry Atoms for AMR and AMR-WB shall be AMRSampleEntry, and for H.263 shall 
be H263SampleEntry, respectively. 

The format of SampleDescriptionEntry and its fields are explained as follows: 

SampleDescriptionEntry ::= VisualSampleEntry I 

AudioSampleEntry I 

HintSampleEntry I 

MpegSampleEntry 

H263SampleEntry I 

AMRSampleEntry 

Table D.1 : SampleDescriptionEntry fields 



Field 


Type 


Details 


Value 


VisualSampleEntry 




Entry type for visual samples defined 
in the MPEG-4 specification. 




AudioSampleEntry 




Entry type for audio samples defined 
in the MPEG-4 specification. 




HintSampleEntry 




Entry type for hint track samples 
defined in the MPEG-4 specification. 




MpegSampleEntry 




Entry type for MPEG related stream 
samples defined in the MPEG-4 
specification. 




H263SampleEntry 




Entry type for H.263 visual samples 
defined in clause D.6 of the present 
document. 




AMRSampleEntry 




Entry type for AMR and AMR-WB 
speech samples defined in clause D.5 
of the present document. 





From the above 6 atoms, only the VisualSampleEntry, AudioSampleEntry, H263SampleEntry and AMRSampleEntry 
atoms are taken into consideration, since MPEG specific streams and hint tracks are out of the scope of the present 
document. 



D.3 VisualSampleEntry atom 

The VisualSampleEntry Atom is defined as follows: 
VisualSampleEntry : : = AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Width 

Height 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved 2 
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Reserved_32 
Reserved_2 
Reserved_2 
ESDAtom 



Table D.2: VisualSampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




'mp4v' 


Reserved_6 


Unsigned 
int(8) [61 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserved_1 6 


Const 
unsigned 
int(32) [4] 







Width 


Unsigned 
int(16) 


Maximum width, in pixels of the 
stream 




Height 


Unsigned 
int(16) 


Maximum height, in pixels of the 
stream 




Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




1 


Reserved_32 


Const 
unsigned 
int(8) [32] 







Reserved_2 


Const 

unsigned 

int(16) 




24 


Reserved 2 


Constint(16) 




-1 


ESDAtom 




Atom containing an elementary 
stream descriptor for this stream. 





The stream type specific information is in the ESDAtom structure, which will be explained later. 

This version of the VisualSampleEntry, with explicit width and height, shall be used for MPEG-4 video streams 
conformant to this specification. 

NOTE: width and height parameters together may be used to allocate the necessary memory in the playback 
device without need to analyse the video stream. 



D.4 AudioSampleEntry atom 

AudioSampleEntryAtom is defined as follows: 
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AudioSampleEntry ::= AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

ESDAtom 



Table D.3: AudioSampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




'mp4a' 


Reserved_6 


Unsigned 
int(8) [6] 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserved_8 


Const 
unsigned 
int(32) [2] 







Reserved_2 


Const 

unsigned 

int(16) 




2 


Reserved_2 


Const 

unsigned 

int(16) 




16 


Reserved_4 


Const 

unsigned 

int(32) 







TimeScale 


Unsigned 
int(16) 


Copied from track 




Reserved_2 


Const 

unsigned 

int(16) 







ESDAtom 




Atom containing an elementary 
stream descriptor for this stream. 





The stream type specific information is in the ESDAtom structure, which will be explained later. 



D.5 AMRSampleEntry atom 

For narrow-band AMR, the atom type of the AMRSampleEntry Atom shall be 'samr'. For AMR wideband (AMR-WB), 
the atom type of the AMRSampleEntry Atom shall be 'sawb'. Each AMR or AMR-WB track shall be associated with a 
single AMRSampleEntry. 

The AMRSampleEntry Atom is defined as follows: 
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AMRSampleEntry ::= AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

AMRSpecificAtom 

Table D.4: AMRSampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




'samr' or 'sawb' 


Reserved_6 


Unsigned 
int(8) [6] 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserved_8 


Const 
unsigned 
int(32) [2] 







Reserved_2 


Const 

unsigned 

int(16) 




2 


Reserved_2 


Const 

unsigned 

int(16) 




16 


Reserved_4 


Const 

unsigned 

int(32) 







TimeScale 


Unsigned 
int(1 6) 


Copied from media header atom of 
this media 




Reserved_2 


Const 

unsigned 

int(16) 







AMRSpecificAtom 




Information specific to the decoder. 





If one compares the AudioSampleEntry Atom - AMRSampleEntry Atom the main difference is in the replacement of 
the ESDAtom, which is specific to MPEG-4 systems, with an atom suitable for AMR and AMR-WB. The 
AMRSpecificAtom field structure is described in clause D.7. 



D.6 H263SampleEntry atom 

The atom type of the H263SampleEntry Atom shall be 's263'. 
The H263SampleEntry Atom is defined as follows: 
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H263SampleEntry ::= AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Width 

Height 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved_2 

Reserved_32 

Reserved_2 

Reserved_2 

H263SpecificAtom 
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Table D.5: H263SampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




's263' 


Reserved_6 


Unsigned 
int(8) [6] 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserved_1 6 


Const 
unsigned 
int(32) [4] 







Width 


Unsigned 
int(16) 


Maximum width, in pixels of the 
stream 




Height 


Unsigned 
int(16) 


Maximum height, in pixels of the 
stream 




Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




1 


Reserved_32 


Const 
unsigned 
int(8) [32] 







Reserved_2 


Const 

unsigned 

int(16) 




24 


Reserved 2 


Constint(16) 




-1 


H263SpecificAtom 




Information specific to the H.263 
decoder. 





If one compares the VisualSampleEntry - H263SampleEntry Atom the main difference is in the replacement of the 
ESDAtom, which is specific to MPEG-4 systems, with an atom suitable for H.263. The H263SpecificAtom field 
structure for H.263 is described in clause D.8. 



D.7 AMRSpecificAtom field for AMRSampleEntry atom 

The AMRSpecificAtom fields for AMR and AMR-WB shall be as defined in table D.6. The AMRSpecificAtom for the 
AMRSampleEntry Atom shall always be included if the MP4 file contains AMR or AMR-WB media. 

Table D.6: The AMRSpecificAtom fields for AMRSampleEntry 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned int(32) 






AtomHeader.Type 


Unsigned int(32) 




'damr' 


DecSpecificlnfo 


AMRDecSpecStruc 


Structure which holds the AMR 
and AMR-WB Specific 
information 





AtomHeader Size and Type: indicate the size and type of the AMR decoder-specific atom. The type must be 'damr'. 
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DecSpecificInfo: the structure where the AMR and AMR-WB stream specific information resides. 
The AMRDecSpecStruc is defined as follows: 
struct AMRDecSpecStruc { 

Unsigned int (32) vendor 

Unsigned int (8) decoder_version 

Unsigned int (16) mode_set 

Unsigned int (8) mode_change_period 

Unsigned int (8) frames_per_sample 

} 

The definitions of AMRDecSpecStruc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about 
the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding 
end. If a manufacturer already has a four character code, it is recommended that it uses the same code in this field. Else, 
it is recommended that the manufacturer creates a four character code which best addresses the manufacturer's name. It 
can be safely ignored. 

decoder_version: version of the vendor's decoder which can decode the encoded stream in the best (i.e. optimal) way. 
This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder 
version pairs. The value is set to if decoder version has no importance for the vendor. It can be safely ignored. 

mode_set: the active codec modes. Each bit of the mode_set parameter corresponds to one mode. The bit index of the 
mode is calculated according to the 4 bit FT field of the AMR or AMR-WB frame structure. The mode_set bit structure 
is as follows: (B15xxxxxxB8B7xxxxxxB0) where BO (Least Significant Bit) corresponds to Mode 0, and B8 
corresponds to Mode 8. 

The mapping of existing AMR modes to FT is given in table 1. a in [19]. A value of 0x8 IFF means all modes and 
comfort noise frames are possibly present in an AMR stream. 

The mapping of existing AMR-WB modes to FT is given in Table l.a in TS 26.201 [37]. A value of 0x83FF means all 
modes and comfort noise frames are possibly present in an AMR-WB stream. 

As an example, if mode_set = 00000001 10010101b, only Modes 0, 2, 4, 7 and 8 are present in the stream. 

mode_change_period: defines a number N, which restricts the mode changes only at a multiple of N frames. If no 
restriction is applied, this value should be set to 0. If mode_change_period is not 0, the following restrictions apply to it 
according to the frames_per_sample field: 

if (mode_change_period < frames _per_sample) 

frames _per_sample = k x (mode_change_period) 
else if (mode_change_period > frames _per_sample) 

mode_change_period = kx (frames _per_sample) 

where k : integer [2, ...] 

If mode_change_period is equal to frames_per_sample, then the mode is the same for all frames inside one sample. 

frames_per_sample: defines the number of frames to be considered as 'one sample' inside the MP4 file. This number 
shall be greater than and less than 16. A value of 1 means each frame is treated as one sample. A value of 10 means 
that 10 frames (of duration 20 msec each) are put together and treated as one sample. It must be noted that, in this case, 
one sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For the last sample of the stream, the number of 
frames can be smaller than frames_per_sample, if the number of remaining frames is smaller than frames_per_sample. 

NOTE1: The "hinter", for the creation of the hint tracks, can use the information given by the AMRDecSpecStruc 
members. 
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NOTE2: The following AMR MIME parameters are not relevant to PSS: {mode_set, mode_change_period, 

mode_change_neighbor}. PSS servers should not send these parameters in SDP, and PSS clients shall 
ignore these parameters if received. 



D.8 H263SpecificAtom field for H263SampleEntry atom 

The H263SpecificAtom fields for H. 263 shall be as defined in table D.7. The H263SpecificAtom for the 
H263SampleEntry Atom shall always be included if the MP4 file contains H.263 media. 

The H263SpecificAtom for H263 is composed of the following fields. 

Table D.7: The H263SpecificAtom fields H263SampleEntry 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned int(32) 






AtomHeader.Type 


Unsigned int(32) 




'd263' 


DecSpecificlnfo 


H263DecSpecStruc 


Structure which holds the 
H.263 Specific information 




BitrateAtom 




Specific bitrate information 
(optional) 





AtomHeader Size and Type: indicate the size and type of the H.263 decoder-specific atom. The type must be 'd263' 
DecSpecificlnfo: This is the structure where the H263 stream specific information resides. 
H263DecSpecStruc is defined as follows: 



struct H263DecSpecStruc{ 



Unsigned int (32) 
Unsigned int (8) 
Unsigned int (8) 
Unsigned int (8) 



vendor 

decoder_version 
H263_Level 
H263_Profile 



} 



The definitions of H263DecSpecStruc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor field gives information about 
the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding 
end. If a manufacturer already has a four character code, it is recommended that it uses the same code in this field. Else, 
it is recommended that the manufacturer creates a four character code which best addresses the manufacturer's name. It 
can be safely ignored. 

decoder_version: version of the vendor's decoder which can decode the encoded stream in the best (i.e. optimal) way. 
This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder 
version pairs. . The value is set to if decoder version has no importance for the vendor. It can be safely ignored. 

H263_Level and H263_Profile: These two parameters define which H263 profile and level is used. These parameters 
are based on the MIME media type video/H263-2000. The profile and level specifications can be found in [23]. 

EXAMPLE 1: H.263 Baseline = {H263_Level = 10, H263_Profile = 0} 

EXAMPLE 2: H.263 Profile 3 @ Level 10 = {H263_Level = 10 , H263_Profile = 3 } 

NOTE: The "hinter", for the creation of the hint tracks, can use the information given by the H263DecSpecStruc 
members. 

The BitrateAtom field shall be as defined in table D.7.1. The BitrateAtom may be included if the MP4 file contains 
H.263 media. 
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The BitrateAtom is composed of the following fields. 



Table D.7.1 : The BitrateAtom fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned int(32) 






AtomHeader.Type 


Unsigned int(32) 




'bitr' 


DecBitratelnfo 


DecBitrStruc 


Structure which holds the 
Bitrate information 





AtomHeader Size and Type: indicate the size and type of the bitrate atom. The type must be 'bitr'. 
DecBitratelnfo: This is the structure where the stream bitrate information resides. 
DecBitrStruc is defined as follows: 
struct DecBitrStruc{ 

Unsigned int (32) Avg_Bitrate 

Unsigned int (32) Max_Bitrate 

} 

The definitions of DecBitrStruc members are as follows: 

Avg_Bitrate: the average bitrate in bits per second of this elementary stream. For streams with variable bitrate this 
value shall be set to zero. 

Max_Bitrate: the maximum bitrate in bits per second of this elementary stream in any time window of one second 
duration. 



D.8a Timed Text Format 



This clause defines the format of timed text in downloaded files. In this release, timed text is downloaded, not 
streamed. 

Operators may specify additional rules and restrictions when deploying terminals, in addition to this specification, and 
behavior that is optional here may be mandatory for particular deployments. In particular, the required character set is 
almost certainly dependent on the geography of the deployment. 



D.8a.1 Unicode Support 



Text in this specification uses the Unicode 3.0 [30] standard. Terminals shall correctly decode both UTF-8 and UTF-16 
into the required characters. If a terminal receives a Unicode code, which it cannot display, it shall display a predictable 
result. It shall not treat multi-byte UTF-8 characters as a series of ASCII characters, for example. 

Authors should create fully-composed Unicode; terminals are not required to handle decomposed sequences for which 
there is a fully-composed equivalent. 

Terminals shall conform to the conformance statement in Unicode 3.0 section 3.1. 

Text strings for display and font names are uniformly coded in UTF-8, or start with a UTF-16 BYTE ORDER MARK 
(\uFEFF) and by that indicate that the string which starts with the byte order mark is in UTF-16. Terminals shall 
recognise the byte-order mark in this byte order; they are not required to recognise byte -re versed UTF-16, indicated by 
a byte-reversed byte-order mark. 
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D.8a.2 Bytes, Characters, and Glyphs 



This clause uses these terms carefully. Since multi-byte characters are permitted (i.e. 16-bit Unicode characters), the 
number of characters in a string may not be the number of bytes. Also, a byte-order-mark is not a character at all, 
though it occupies two bytes. So, for example, storage lengths are specified as byte-counts, whereas highlighting is 
specified using character offsets. 

It should also be noted that in some writing systems the number of glyphs rendered might be different again. For 
example, in English, the characters 'fi' are sometimes rendered as a single ligature glyph. 

In this specification, the first character is at offset in the string. In records specifying both a start and end offset, the 
end offset shall be greater than or equal to the start offset. In cases where several offset specifications occur in 
sequence, the start offset of an element shall be greater than or equal to the end offset of the preceding element. 

D.8a.3 Character Set Support 

All terminals shall be able to render Unicode characters in these ranges: 

a) basic ASCII and Latin- 1 (\u0000 to \u00FF), though not all the control characters in this range are needed; 

b) the Euro currency symbol (\u20AC) 

c) telephone and ballot symbols (\u260E through \u2612) 
Support for the following characters is recommended but not required: 

a) miscellaneous technical symbols (\u2300 through \u2335) 

b) 'Zapf Dingbats': locations \u2700 through \u27AF, and the locations where some symbols have been relocated 
(e.g. \u2605, Black star). 

The private use characters \u0091 and \u0092, and the initial range of the private use area \uE000 through \uE0FF are 
reserved in this specification. For these Unicode values, and for control characters for which there is no defined 
graphical behaviour, the terminal shall not display any result: neither a glyph is shown nor is the current rendering 
position changed. 



D.8a.4 Font Support 



Fonts are specified in this specification by name, size, and style. There are three special names which shall be 
recognized by the terminal: Serif, Sans-Serif, and Monospace. It is strongly recommended that these be different fonts 
for the required characters from ASCII and Latin- 1. For many other characters, the terminal may have a limited set or 
only a single font. Terminals requested to render a character where the selected font does not support that character 
should substitute a suitable font. This ensures that languages with only one font (e.g. Asian languages) or symbols for 
which there is only one form are rendered. 

Fonts are requested by name, in an ordered list. Authors should normally specify one of the special names last in the 
list. 

Terminals shall support a pixel size of 12 (on a 72dpi display, this would be a point size of 12). If a size is requested 
other than the size(s) supported by the terminal, the next smaller supported size should be used. If the requested size is 
smaller than the smallest supported size, the terminal should use the smallest supported size. 

Terminals shall support unstyled text for those characters it supports. It may also support bold, italic (oblique) and 
bold-italic. If a style is requested which the terminal does not support, it should substitute a supported style; a character 
shall be rendered if the terminal has that character in any style of any font. 

D.8a.5 Fonts and Metrics 

Within the sample description, a complete list of the fonts used in the samples is found. This enables the terminal to 
pre-load them, or to decide on font substitution. 



ETSI 



3GPP TS 26.234 version 5.2.0 Release 5 59 ETSI TS 1 26 234 V5.2.0 (2002-09) 

Terminals may use varying versions of the same font. For example, here is the same text rendered on two systems; it 
was authored on the first, where it just fitted into the text box. 

EXAMPLE: 



This is iifilxinK whith \s rcudcTuJ [olhc [crmi rial 



rhis is a alrine which ■ s. rendered U> Lhc Lcrmii 



Authors should be aware of this possible variation, and provide text box areas with some 'slack' to allow for rendering 
variations. 



D.8a.6 Colour Support 



The colour of both text and background are indicated in this specification using RGB values. Terminals are not 
required to be able to display all colours in the RGB space. Terminals with a limited colour display, with only gray- 
scale display, and with only black-and-white are permissible. If a terminal has a limited colour capability it should 
substitute a suitable colour; dithering of text may be used but is not usually appropriate as it results in "fuzzy" display. 
If colour substitution is performed, the substitution shall be consistent: the same RGB colour shall result consistently in 
the same displayed colour. If the same colour is chosen for background and text, then the text shall be invisible (unless 
a style such as highlight changes its colour). If different colours are specified for the background and text, the terminal 
shall map these to different colours, so that the text is visible. 

Colours in this specification also have an alpha or transparency value. In this specification, a transparency value of 
indicates a fully transparent colour, and a value of 255 indicates fully opaque. Support for partial or full transparency is 
optional. 'Keying' text (text rendered on a transparent background) is done by using a background colour which is fully 
transparent. 'Keying' text over video or pictures, and support for transparency in general, can be complex and may 
require double-buffering, and its support is optional in the terminal. Content authors should beware that if they specify 
a colour which is not fully opaque, and the content is played on a terminal not supporting it, the affected area (the entire 
text box for a background colour) will be fully opaque and will obscure visual material behind it. Visual material with 
transparency is layered closer to the viewer than the material which it partially obscures. 

D.8a.7 Text rendering position and composition 

Text is rendered within a region (a concept derived from SMIL). There is a text box set within that region. This 
permits the terminal to position the text within the overall presentation, and also to render the text appropriately given 
the writing direction. For text written left to right, for example, the first character would be rendered at, or near, the left 
edge of the box, and with its baseline down from the top of the box by one baseline height (a value derived from the 
font and font size chosen). Similar considerations apply to the other writing directions. 

Within the region, text is rendered within a text box. There is a default text box set, which can be over-ridden by a 
sample. 

The text box is filled with the background colour; after that the text is painted in the text colour. If highlighting is 
requested one or both of these colours may vary. 

Terminals may choose to anti-alias their text, or not. 

The text region and layering are defined using structures from the ISO base media file format. 

This track header box is used for text track: 

aligned(8) class TrackHeaderBox 

extends FullBox { 'tkhd' , version, flags) { 
if (version==l) { 

unsigned int(64) creation_time; 

unsigned int(64) modif ication_time; 

unsigned int(32) track_ID; 

const unsigned int(32) reserved = 0; 

unsigned int(64) duration; 
} else { // version==0 

unsigned int(32) creation_time; 
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unsigned int(32) modif ication_time; 

unsigned int(32) track_ID; 

const unsigned int(32) reserved = 0; 

unsigned int(32) duration; 
} 

const unsigned int(32) [2] reserved = 0; 
int (16) layer; 

template int (16) alternate_group = 0; 
template int (16) volume = 0; 
const unsigned int (16) reserved = 0; 
template int (32) [9] matrix= 

{ 0x00010000, 0,0,0, 0x00010000, 0,tx,ty, 0x40000000 

// unity matrix 
unsigned int (32) width; 
unsigned int (32) height; 



Visually composed tracks including video and text are layered using the 'layer' value. This compares, for example, to 
z-index in SMIL. More negative layer values are towards the viewer. (This definition is compatible with that in 
ISO/MJ2). 

The region is defined by the track width and height, and translation offset. This corresponds to the SMIL region. The 
width and height are stored in the track header fields above. The sample description sets a text box within the region, 
which can be over-ridden by the samples. 

The translation values are stored in the track header matrix in the following positions: 

{ 0x00010000,0,0, 0,0x00010000,0, tx, ty, 0x40000000 } 

These values are fixed-point 16.16 values, here restricted to be integers (the lower 16 bits of each value shall be zero). 
The X axis increases from left to right; the Y axis from top to bottom. (This use of the matrix is conformant with 
ISO/MJ2.) 

So, for example, a centered region of size 200x20, positioned below a video of size 320x240, would have track_width 
set to 200 (widh= 0x00c80000), trackjieight set to 20 (height= 0x00140000), and tx = (320-200)/2 = 60, and ty=240. 

Since matrices are not used on the video tracks, all video tracks are set at the coordinate origin. Figure D.2 provides an 
overview: 
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Figure D.2: Illustration of text rendering position and composition 

The top and left positions of the text track is determined by the tx and ty, which are the translation values from the 
coordinate origin (since the video track is at the origin, this is also the offset from the video track). The default text box 
set in the sample description sets the rendering area unless over-ridden by a 'tbox' in the text sample. The box values 
are defined as the relative values from the top and left positions of the text track. 

It should be noted that this only specifies the relationship of the tracks within a single 3GP (MP4) file. If a SMIL 
presentation lays up multiple files, their relative position is set by the SMIL regions. Each file is assigned to a region, 
and then within those regions the spatial relationship of the tracks is defined. 
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D.8a.8 Marquee Scrolling 



Text can be 'marquee' scrolled in this specification (compare this to Internet Explorer's marquee construction). When 
scrolling is performed, the terminal first calculates the position in which the text would be displayed with no scrolling 
requested. Then: 

a) If scroll-in is requested, the text is initially invisible, just outside the text box, and enters the box in the indicated 
direction, scrolling until it is in the normal position; 

b) If scroll-out is requested, the text scrolls from the normal position, in the indicated direction, until it is 
completely outside the text box. 

The rendered text is clipped to the text box in each display position, as always. This means that it is possible to scroll a 
string which is longer than can fit into the text box, progressively disclosing it (for example, like a ticker-tape). Note 
that both scroll in and scroll out may be specified; the text scrolls continuously from its invisible initial position, 
through the normal position, and out to its final position. 

If a scroll-delay is specified, the text stays steady in its normal position (not initial position) for the duration of the 
delay; so the delay is after a scroll-in but before a scroll-out. This means that the scrolling is not continuous if both are 
specified. So without a delay, the text is in motion for the duration of the sample. For a scroll in, it reaches its normal 
position at the end of the sample duration; with a delay, it reaches its normal position before the end of the sample 
duration, and remains in its normal position for the delay duration, which ends at the end of the sample duration. 
Similarly for a scroll out, the delay happens in its normal position before scrolling starts. If both scroll in, and scroll out 
are specified, with a delay, the text scrolls in, stays stationary at the normal position for the delay period, and then 
scrolls out - all within the sample duration. 

The speed of scrolling is calculated so that the complete operation takes place within the duration of the sample. 
Therefore the scrolling has to occur within the time left after scroll-delay has been subtracted from the sample duration. 
Note that the time it takes to scroll a string may depend on the rendered length of the actual text string. Authors should 
consider whether the scrolling speed that results will be exceed that at which text on a wireless terminal could be 
readable. 

Terminals may use simple algorithms to determine the actual scroll speed. For example, the speed may be determined 
by moving the text an integer number of pixels in every update cycle. Terminals should choose a scroll speed which is 
as fast or faster than needed so that the scroll operation completes within the sample duration. 

Terminals are not required to handle dynamic or stylistic effects such as highlight, dynamic highlight, or href links on 
scrolled text. 

The scrolling direction is set by a two-bit field, with the following possible values: 

00b - text is vertically scrolled up ('credits style'), entering from the bottom of the bottom and leaving towards 
the top. 

01b - text is horizontally scrolled ('marquee style'), entering from the right and leaving towards the left. 

10b - text is vertically scrolled down, entering from the top and leaving towards the bottom. 

1 lb - text is horizontally scrolled, entering from the left and leaving towards the right. 



D.8a.9 Language 



The human language used in this stream is declared by the language field of the media-header atom in this track. It is 
an ISO 639/T 3-letter code. The knowledge of the language used might assist searching, or speaking the text. 
Rendering is language neutral. Note that the values 'und' (undetermined) and 'mul' (multiple languages) might occur. 



D.8a.10Writing direction 



Writing direction specifies the way in which the character position changes after each character is rendered. It also will 
imply a start-point for the rendering within the box. 
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Terminals shall support the determination of writing direction, for those characters they support, according to the 
Unicode 3.0 specification. Note that the only required characters can all be rendered using left-right behaviour. A 
terminal which supports characters with right-left writing direction shall support the right-left composition rules 
specified in Unicode. 

Terminals may also set, or allow the user to set, an overall writing direction, either explicitly or implicitly (e.g. by the 
language selection). This affects layout. For example, if upper-case letters are left-right, and lower-case right-left, and 
the Unicode string ABCdefGHI shall be rendered, it would appear as ABCfedGHI on a terminal with overall left-right 
writing (English, for example) and GHIdefABC on a system with overall right-left (Hebrew, for example). 

Terminals are not required to support the bi-directional ordering codes (\u200E, \u200F and \u202A through \u202E). 

If vertical text is requested by the content author, characters are laid out vertically from top to bottom. The terminal 
may choose to render different glyphs for this writing direction (e.g. a horizontal parenthesis), but in general the glyphs 
should not be rotated. The direction in which lines advance (left-right, as used for European languages, or right-left, as 
used for Asian languages) is set by the terminal, possibly by a direct or indirect user preference (e.g. a language setting). 
Terminals shall support vertical writing of the required character set. It is recommended that terminals support vertical 
writing of text in those languages commonly written vertically (e.g. Asian languages). If vertical text is requested for 
characters which the terminal cannot render vertically, the terminal may behave as if the characters were not available. 



D.8a.11 Text wrap 



Automatic wrapping of text from line to line is complex, and can require hyphenation rules and other complex 
language-specific criteria. For these reasons, text is not wrapped in this specification. If a string is too long to be drawn 
within the box, it is clipped. The terminal may choose whether to clip at the pixel boundary, or to render only whole 
glyphs. 

There may be multiple lines of text in a sample (hard wrap). Terminals shall start a new line for the Unicode characters 
line separator (\u2028), paragraph separator (\u2029) and line feed (\u000A). It is recommended that terminals follow 
Unicode Technical Report 13 [48]. Terminals should treat carriage return (\u000D), next line (\u0085) and CR+LF 
(\u000D\u000A) as new line. 

D.8a.12Highlighting, Closed Caption, and Karaoke 

Text may be highlighted for emphasis. Since this is a non-interactive system, solely for text display, the utility of this 
function may be limited. 

Dynamic highlighting used for Closed Caption and Karaoke highlighting, is an extension of highlighting. Successive 
contiguous sub-strings of the text sample are highlighted at the specified times. 

D.8a.13Media Handler 

A text stream is its own unique stream type. For the 3GPP file format, the handler-type within the 'hdlr' atom shall be 
'text'. 

D. 8a. 14 Media Handler Header 

The 3G text track uses an empty null media header ('nmhd'), called Mpeg4MediaHeaderAtom in the MP4 
specification, in common with other MPEG streams. 

aligned(8) class Mpeg4MediaHeaderAtom 

extends FullAtomC nmhd' , version = 0, flags) { 
} 

D.8a.15Style record 

Both the sample format and the sample description contain style records, and so it is defined once here for compactness. 

aligned{8) class StyleRecord { 

unsigned int(16) startChar; 
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unsigned int(16) endChar; 

unsigned int(16) font-ID; 

unsigned int(8) face-style-flags; 

unsigned int(8) font-size; 

unsigned int(8) text-color-rgba [4] ; 
} 

startChar: character offset of the beginning of this style run (always in a sample description) 

endChar: first character offset to which this style does not apply (always in a sample description); shall be 

greater than or equal to startChar. All characters, including line-break characters and any other 
non-printing characters, are included in the character counts. 

font-ID: font identifier from the font table; in a sample description, this is the default font 

face style flags: in the absence of any bits set, the text is plain 

lbold 

2 italic 

4 underline 

font-size: font size (nominal pixel size, in essentially the same units as the width and height) 

text-color-rgba: rgb colour, 8 bits each of red, green, blue, and an alpha (transparency) value 

Terminals shall support plain text, and underlined horizontal text, and may support bold, italic and bold-italic depending 
on their capabilities and the font selected. If a style is not supported, the text shall still be rendered in the closest style 
available. 



D.8a.16Sample Description Format 



The sample table box ('stbl') contains sample descriptions for the text track. Each entry is a sample entry box of type 
'tx3g'. This name defines the format both of the sample description and the samples associated with that sample 
description. Terminals shall not attempt to decode or display sample descriptions with unrecognised names, nor the 
samples attached to those sample descriptions. 

It starts with the standard fields (the reserved bytes and the data reference index), and then some text-specific fields. 
Some fields can be overridden or supplemented by additional boxes within the text sample itself. These are discussed 
below. 

There can be multiple text sample descriptions in the sample table. If the overall text characteristics do not change from 
one sample to the next, the same sample description is used. Otherwise, a new sample description is added to the table. 
Not all changes to text characteristics require a new sample description, however. Some characteristics, such as font 
size, can be overridden on a character-by-character basis. Some, such as dynamic highlighting, are not part of the text 
sample description and can be changed dynamically. 

The TextDescription extends the regular sample entry with the following fields. 

class FontRecord { 

unsigned int(16) font-ID; 

unsigned int(8) font-name-length; 

unsigned int(8) font [font-name-length] ; 
} 

class FontTableBox ( ) extends Box('ftab') { 

unsigned int(16) entry-count; 

FontRecord font -entry [entry-count] ; 
} 

class BoxRecord { 

signed int(16) top; 

signed int(16) left; 

signed int(16) bottom; 

signed int(16) right; 
} 

class TextSampleEntry ( ) extends SampleEntry ( *tx3g' ) { 
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unsigned int(32) displayFlags; 

signed int(8) horizontal- justification; 

signed int(8) vertical- justification; 
unsigned int(8) background-color-rgba [4] ; 

BoxRecord default-text-box; 

StyleRecord default-style; 

FontTableBox font-table; 



displayFlags: 

scroll In 0x00000020 

scroll Out 0x00000040 
scroll direction 0x00000180 

continuous karaoke 0x00000800 
write text vertically 0x00020000 

horizontal and vertical justification: 
left, top 

centered 1 



/ see above for values 



/ two eight-bit values from the following list: 



bottom, right - 1 

background-color-rgba: 

rgb color, 8 bits each of red, green, blue, and an alpha (transparency) value 

Default text box: the default text box is set by four values, relative to the text region; it may be over-ridden in 
samples; 

style record of default style: startChar and endChar shall be zero in a sample description 

The text box is inset within the region defined by the track translation offset, width, and height. The values in the box 
are relative to the track region, and are uniformly coded with respect to the pixel grid. So, for example, the default text 
box for a track at the top left of the track region and 50 pixels high and 1 00 pixels wide is { 0, 0, 50, 1 00 } . 

A font table shall follow these fields, to define the complete set of fonts used. The font table is an atom of type 'ftab'. 
Every font used in the samples is defined here by name. Each entry consists of a 16-bit local font identifier, and a font 
name, expressed as a string, preceded by an 8-bit field giving the length of the string in bytes. The name is expressed in 
UTF-8 characters, unless preceded by a UTF-16 byte-order-mark, whereupon the rest of the string is in 16-bit Unicode 
characters. The string should be a comma separated list of font names to be used as alternative font, in preference 
order. The special names "Serif, "Sans-serif and "Monospace" may be used. The terminal should use the first font in 
the list which it can support; if it cannot support any for a given character, but it has a font which can, it should use that 
font. Note that this substitution is technically character by character, but terminals are encouraged to keep runs of 
characters in a consistent font where possible. 



D.8a.17Sample Format 



Each sample in the media data consists of a string of text, optionally followed by sample modifier boxes. 

For example, if one word in the sample has a different size than the others, a 'styl' box is appended to that sample, 
specifying a new text style for those characters, and for the remaining characters in the sample. This overrides the style 
in the sample description. These boxes are present only if they are needed. If all text conforms to the sample 
description, and no characteristics are applied that the sample description does not cover, no boxes are inserted into the 
sample data. 

class TextSampleModif ierBox (type) extends Box (type) { 



class TextSample { 
unsigned int(16) 
unsigned int (8) 
TextSampleModif ierBox 



text-length; 
text [text-length] ; 
text-modifier [ ] ; 



// to end of the sample 
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The initial string is preceded by a 16-bit count of the number of bytes in the string. There is no need for null termination 
of the text string. The sample size table provides the complete byte-count of each sample, including the trailing modifier 
boxes; by comparing the string length and the sample size, you can determine how much space, if any, is left for 
modifier boxes. 

Authors should limit the string in each text sample to not more than 2048 bytes, for maximum terminal interoperability. 

Any unrecognised box found in the text sample should be skipped and ignored, and processing continue as if it were not 
there. 

D.8a.1 7.1 Sample Modifier Boxes 
D. 8a. 17. 1.1 Text Style 

'styl' 

This specifies the style of the text. It consists of a series of style records as defined above, preceded by a 16-bit count of 
the number of style records. Each record specifies the starting and ending character positions of the text to which it 
applies. The styles shall be ordered by starting character offset, and the starting offset of one style record shall be 
greater than or equal to the ending character offset of the preceding record; styles records shall not overlap their 
character ranges. 

class TextStyleBox ( ) extends TextSampleModif ierBox ( 'styl' ) { 
unsigned int(16) entry-count; 
StyleRecord text-styles [entry-count] ; 



D.8a.17.1.2 Highlight 

'hlit' - Specifies highlighted text: the atom contains two 16-bit integers, the starting character to highlight, and the first 
character with no highlighting (e.g. values 4, 6 would highlight the two characters 4 and 5). The second value may be 
the number of characters in the text plus one, to indicate that the last character is highlighted. 

class TextHighlightBox ( ) extends TextSampleModif ierBox ('hlit') { 

unsigned int(16) startcharof f set; 

unsigned int(16) endcharof f set; 
} 
class TextHilightColorBox { ) extends TextSampleModif ierBox ('heir') { 

unsigned int(8) highlight_color_rgba [4] ; 
} 

highlight_color_rgb: 

rgb color, 8 bits each of red, green, blue, and an alpha (transparency) value 



The TextHilightColor Box may be present when the TextHighlightBox or TextKaraokeBox is present in a text sample. 
It is recommended that terminals use the following rules to determine the displayed effect when highlight is requested: 

a) if a highlight colour is not specified, then the text is highlighted using a suitable technique such as inverse video: 
both the text colour and the background colour change. 

b) if a highlight colour is specified, the background colour is set to the highlight colour for the highlighted 
characters; the text colour does not change. 

Terminals do not need to handle text that is both scrolled and either statically or dynamically highlighted. Content 
authors should avoid specifying both scroll and highlight for the same sample. 

D.8a.1 7.1 .3 Dynamic Highlight 

'krok - Karaoke, closed caption, or dynamic highlighting. The number of highlight events is specified, and each event 
is specified by a starting and ending character offset and an end time for the event. The start time is either the sample 
start time or the end time of the previous event. The specified characters are highlighted from the previous end-time 
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(initially the beginning of this sample's time), to the end time. The times are all specified relative to the sample's time; 
that is, a time of represents the beginning of the sample time. The times are measured in the timescale of the track. 

The atom starts with the start-time offset of the first highlight event, a 16-bit count of the event count, and then that 
number of 8-byte records. Each record contains the end-time offset as a 32-bit number, and the text start and end 
values, each as a 16-bit number. These values are specified as in the highlight record - the offset of the first character to 
highlight, and the offset of the first character not highlighted. The special case, where the startcharoffset equals to the 
endcharoffset, can be used to pause during or at the beginning of dynamic highlighting. The records shall be ordered 
and not overlap, as in the highlight record. The time in each record is the end time of this highlight event; the first 
highlight event starts at the indicated start-time offset from the start time of the sample. The time values are in the units 
expressed by the timescale of the track. The time values shall not exceed the duration of the sample. 

The continuouskaraoke flag controls whether to highlight only those characters (continuouskaraoke = 0) selected by a 
karaoke entry, or the entire string from the beginning up to the characters highlighted (continuouskaraoke = 1) at any 
given time. In other words, the flag specifies whether karaoke should ignore the starting offset and highlight all text 
from the beginning of the sample to the ending offset. 

Karaoke highlighting is usually achieved by using the highlight colour as the text colour, without changing the 
background. 

At most one dynamic highlight ('krok') atom may occur in a sample. 

class TextKaraokeBox ( ) extends TextSampleModif ierBox ( ^krok' ) { 
unsigned int(32) highlight-start-time; 
unsigned int(16) entry-count; 
for (i=l; i<=entry-count ; i++) { 

unsigned int(32) highlight-end-time; 

unsigned int(16) startcharoffset; 

unsigned int(16) endcharoffset; 



D.8a.17.1.4 Scroll Delay 

'dlay' - Specifies a delay after a Scroll In and/or before Scroll Out. A 32-bit integer specifying the delay, in the units of 
the timescale of the track. The default delay, in the absence of this box, is 0. 

class TextScrollDelayBox ( ) extends TextSampleModif ierBox ( Mlay' ) { 
unsigned int(32) scroll-delay; 



D.8a.17.1.5 HyperText 

'href - HyperText link. The existence of the hypertext link is visually indicated in a suitable style (e.g. underlined blue 
text). 

This box contains these values: 

startCharOffset: - the start offset of the text to be linked 

endCharOffset: - the end offset of the text (start offset + number of characters) 

URLLength:- the number of bytes in the following URL 

URL: UTF-8 characters - the linked-to URL 

altLength:- the number of bytes in the following "alt" string 

altstring: UTF-8 characters - an "alt" string for user display 

The URL should be an absolute URL, as the context for a relative URL may not always be clear. 

The "alt" string may be used as a tool-tip or other visual clue, as a substitute for the URL, if desired by the terminal, to 
display to the user as a hint on where the link refers. 
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Hypertext-linked text should not be scrolled; not all terminals can display this or manage the user interaction to 
determine whether user has interacted with moving text. It is also hard for the user to interact with scrolling text. 

class TextHyperTextBox ( ) extends TextSampleModif ierBox ( 'href ' ) { 

unsigned int(16) startcharof f set; 

unsigned int(16) endcharof f set; 

unsigned int(8) URLLength; 

unsigned int(8) URL [URLLength] ; 

unsigned int(8) altLength; 

unsigned int(8) altstring [altLength] ; 



D.8a.17.1.6 Textbox 

'tbox' - text box over-ride. This over-rides the default text box set in the sample description. 

class TextboxBoxf) extends TextSampleModif ierBox ('tbox') { 
BoxRecord text-box; 



D.8a.17.1.7 Blink 

'blnk' - Blinking text. This requests blinking text for the indicated character range. Terminals are not required to 
support blinking text, and the precise way in which blinking is achieved, and its rate, is terminal-dependent. 

class BlinkBox() extends TextSampleModif ierBox {'blnk') { 
unsigned int(16) startcharof f set; 

unsigned int(16) endcharof f set ; 



D.8a.18Combinations of features 

Two modifier boxes of the same type shall not be applied to the same character (e.g. it is not permitted to have two href 
links from the same text). As the 'heir', 'dlay' and 'tbox' are globally applied to the whole text in a sample, two 
modifier boxes of the same type shall not be present within a sample. 

Table D.8 details the effects of multiple options: 

Table D.8: Combinations of features 







First sample modifier atom 




Sample description style record 


styl 


hlit 


krok 


href 


blnk 


Second sample 
modifier atom 


styl 


1 


3 










hlit 






3 








krok 






4 


3 






href 


2 


2 




5 


3 




blnk 




6 


6 


6 


6 


6 



1. The sample description provides the default style; the style records over-ride this for the selected characters. 

2. The terminal over-rides the chosen style for HREF links. 

3. Two records of the same type cannot be applied to the same character. 

4. Dynamic and static highlighting must not be applied to the same text. 

5. Dynamic highlighting and linking must not be applied to the same text. 

6. Blinking text is optional, particularly when requested in combination with other features. 
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D.9 File Identification 



3GPP multimedia files can be identified using several mechanisms. When stored in traditional computer file systems, 
these files should be given the file extension ".3gp" (readers should allow mixed case for the alphabetic characters). 
The MIME types "video/3gpp" (for visual or audio/visual content, where visual includes both video and timed text) and 
"audio/3gpp" (for purely audio content) are expected to be registered and used. 

A file-type atom, as defined in the JPEG 2000 specification [36] shall be present in conforming files. The file type box 
'ftyp' shall occur before any variable-length box (e.g. movie, free space, media data). Only a fixed-size box such as a 
file signature, if required, may precede it. 

The brand identifier for this specification is '3gp5'. This brand identifier must occur in the compatible brands list, and 
may also be the primary brand. If the file is also conformant to release 4 of this specification, it is recommended that 
the Release 4 brand '3gp4' also occur in the compatible brands list; if 3gp4 is not in the compatible brand list the file 
will not be processed by a Release 4 reader. Readers should check the compatible brands list for the identifiers they 
recognize, and not rely on the file having a particular primary brand, for maximum compatibility. Files may be 
compatible with more than one brand, and have a 'best use' other than this specification, yet still be compatible with this 
specification. 

Table D.9: The File-Type atom 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




'ftyp' 


Brand 


Unsigned 
int(32) 


The major or 'best use' of this file 




MinorVersion 


Unsigned 
int(32) 






CompatibleBrands 


Unsigned 
int(32) 


A list of brands, to end of the atom 





Brand: Identifies the 'best use' of this file. The brand should match the file extension. For files with extension '.3gp' 
and conforming to this specification, the brand shall be '3gp4'. 

MinorVersion: This identifies the minor version of the brand. For files with brand '3gpZ', where Z is a digit, and 
conforming to release Z.x.y, this field takes the value x*256 + y. 

CompatibleBrands: a list of brand identifiers (to the end of the atom). '3gp5' shall be a member of this list. 
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Annex E (normative): 

RTP payload format and file storage format for AMR and 

AMR-WB audio 

The AMR and AMR-WB speech codec RTP payload, storage format and MIME type registration are specified in [11], 
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Annex F (normative): 

RDF schema for the PSS base vocabulary 

<?xml version=" 1 . " ?> 

<! — 

This document is the RDF Schema for streaming-specific vocabulary 
as defined in 3GPP TS 26.234 Rel.5 (in the following "the 
specification") . 

The URI for unique identification of this RDF Schema is 
http: //www. 3gpp. org /prof iles/PSS/ccppschema-PSS5 

This RDF Schema includes the same information as the respective 
chapter of the specification. Greates care has been taken to keep 
the two documents consistence. However, in case of any divergence 
the specification takes presidence . 

All reference in this RDF Schmea are to be interpreted relative to 
the specification. This means all references using the form 
[ref] are defined in chapter 2 "References of the 
specification. All other references refer to parts within that 
document . 

Note: This Schemas has been aligned in structure and base 
vocabulary to the RDF Schema used by UAProf [40] . 



<rdf :RDF xmlns : rdf ="http : //www. w3 . org/1999/02/22-rdf-syntax-ns " 
xmlns : rdf s= "http : //www. w3 . org/2000 /01 /rdf- schema" > 

< i ****************************************************************** > 

<! — ***** Properties shared among the components***** — > 

<rdf : Description ID=" defaults "> 

<rdf s : type rdf: resource = "http : //www. w3 . org/2 000/01 /rdf schema#P rope rty" /> 

<rdf s : domain rdf : resource=" St reaming" /> 

<rdf s : comment> 

An attribute used to identify the default capabilities . 

</rdf s : comment> 
</rdf : Description> 

< i ****************************************************************** > 

<! — ***** Component Definitions ***** — > 

<rdf : Description ID=" St reaming "> 

<rdf : type resource="http: //www. w3 . org/2000/01/rdf-schema#Class " /> 

<rdf s : subClassOf rdf : resource="http : //www. wap forum. org/UAPROF/ccppschema-20010330#Component " /> 

<rdf s : label>Component : Streaming</rdf s : label> 

<rdf s : comment> 

The Streaming component specifies the base vocabulary for 

PSS. PSS servers supporting capability exchange should 

understand the attributes in this component as explained in 

detail in 3GPP TS 26.234 rel. 5. 
</rdf s : comment> 
</rdf : Description> 

< i — ** 

** In the following property definitions, the defined types 
** are as follows: 

** Number: A positive integer 

** [0-9]+ 

** Boolean: A yes or no value 

** Yes|No 

** Literal: An alphanumeric string 

** [A-Za-z0-9/.\-_]+ 

** Dimension: A pair of numbers 

** [0-9]+x[0-9]+ 
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< I ****************************************************************** > 

<! — ***** Component: Streaming ***** — > 

<rdf : Description ID="AudioChannels"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01/rdf schema#Property" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: This attribute describes the stereophonic capability of the natural audio device. 
The only legal values are "Mono" and "Stereo". 

Type : Literal 
Resolution : Locked 
Examples : "Mono", "Stereo" 
</rdf s : comment> 
</rdf : Description> 

<rdf : Description ID="VideoPreDecoderBuf f erSize"> 

<rdf : type rdf: resource="http : //www. w3 . org/2000 /01 /rdf schema#P rope rty" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: This attribute signals if the optional video 

buffering requirements defined in Annex G are supported. It also 

defines the size of the hypothetical pre-decoder buffer defined in 

Annex G. A value equal to zero means that Annex G is not 

supported. A value equal to one means that Annex G is 

supported. In this case the size of the buffer is the default size 

defined in Annex G. A value equal to or greater than the default 

buffer size defined in Annex G means that Annex G is supported and 

sets the buffer size to the given number of octets. Legal values are all 

integer values equal to or greater than zero. Values greater than 

one but less than the default buffer size defined in Annex G are 

not allowed. 

Type : Number 
Resolution : Locked 
Examples: "0", "4096" 
</rdf s : comment> 
</rdf : Description> 

<rdf : Description ID="VideoInitialPostDecoderBuf f eringPeriod"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf schema #P rope rty" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: If Annex G is not supported, the attribute has no 
meaning. If Annex G is supported, this attribute defines the 
maximum initial post-decoder buffering period of video. Values are 
interpreted as clock ticks of a 90-kHz clock. In other words, the 
value is incremented by one for each 1/90 000 seconds. For 
example, the value 9000 corresponds to 1/10 of a second initial 
post-decodder buffering. Legal valaues are all integer value equal 
to or greater than zero. 

Type : Number 
Resolution : Locked 

Examples : <VideoInitialPostDecoderBuf f eringPeriod> 
9000 
</VideoInitialPost Decode rBuf f eringPeriod> 
</rdf s : comment> 
</rdf : Description> 

<rdf : Description ID=" VideoDecodingByteRate "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf schema #P rope rty" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 
Description: If Annex G is not supported, the attribute has no meaning. If Annex G is supported, 
this attribute defines the peak decoding byte rate the PSS client is able to support. In other 
words, the PSS client fulfils the requirements given in Annex G with the signalled peak decoding 
byte rate. The values are given in bytes per second and shall be greater than or equal to 8000. 
According to Annex G, 8000 is the default peak decoding byte rate for the mandatory video codec 
profile and level (H.263 Profile Level 10) .Legal values are integer value greater than or equal 
to 8000. 
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Type: Number 
Resolution : Locked 

Examples : <VideoDecodingByteRate>16000</VideoDecodingByteRate> 
</rdf s : comment> 
</rdf : Description> 

<rdf : Description ID=" MaxPolyphony"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01/rdf schema #Property" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: Attribute definition: The MaxPolyphony attribute refers to the maximal polyphony 
that the synthetic audio device supports as defined in [44]. Legal values are integer between 5 
to 24. 
NOTE: MaxPolyphony attribute can be used to signal the maximum polyphony capabilities 

supported by the PSS client. This is a complementary mechanism for the delivery of 
compatible SP-MIDI content and thus the PSS client is required to support Scalable 
Polyphony MIDI i.e. Channel Masking defined in [44]. 

Type : Number 
Resolution : Locked 

Examples : <MaxPolyphony>8</MaxPolyphony> 
</rdf s : comment> 
</rdf : Description> 

<rdf: Description ID="PssAccept"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf schema #Property " /> 
<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf- schema #Bag" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: List of content types (MIME types) the PSS 

application supports . Both CcppAccept (SoftwarePlatf orm, UAProf ) 

and PssAccept can be used but if PssAccept is defined it has 

precedence over CcppAccept and a PSS application shall then use 

PssAccept . 

Type : Literal (bag) 
Resolution : Append 

Examples : " audio /AMR-WB; octet -alignment, application/smil" 
</rdf s : comment> 
</rdf :Description> 

<rdf: Description ID="PssAccept-Subset "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf schema #Property " /> 
<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf- schema #Bag" /> 
<rdf s : domain rdf : resource="# St reaming" /> 
<rdf s : comment> 

Description: List of content types for which the PSS application 

supports a subset. MIME-types can in most cases effectively be 

used to express variations in support for different media 

types. Many MIME-types, e.g. AMR-NB has several parameters that 

can be used for this purpose. There may exist content types for 

which the PSS application only supports a subset and this subset 

can not be expressed with MIME-type parameters. In these cases the 

attribute PssAccept- Subset is used to describe support for a 

subset of a specific content type . If a subset of a specific 

content type is declared in PssAccept -Subset, this means that 

PssAccept -Subset has precedence over both PssAccept and CcppAccept . 

PssAccept and/or CcppAccept shall always include the corresponding 

content types for which PSSAccept- Subset specifies subsets of. 

This is to ensure compatibility with those content servers that 

do not understand the PssAccept-Subset attribute but do understand e.g. CcppAccept . 

This is illustrated with an example. If PssAccept="audio/AMR" , 

"image/jpeg" and PssAccept-Subset=" JPEG-PSS" then "audio/AMR" 

and JPEG Base line is supported, "image/jpeg" in PssAccept is of no 

importance since it is related to "JPEG-PSS" in PssAccept-Subset . 

Subset identifiers and corresponding semantics shall only be defined by 

the TSG responsible for the present document. The following values are defined: 

- "JPEG-PSS": Only the two JPEG modes described in clause 7.5 of the present 

document are supported. 

"SVG-Tiny" 

"SVG-Basic" 
Legal values are subset identifiers defined by the specification. 

Type : Literal (bag) 
Resolution : Locked 
Examples : "JPEG-PSS", "SVG-Tiny", "SVG-Basic" 
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</rdf s : comment> 
</rdf :Description> 



<rdf: Description ID="PssVersion"> 

e <rdf : type rdf : resource="http : //www. w3 . org/2000 /01/rdf schema #Property " /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: Latest PSS version supported by the client . Legal 

values are "3GPP-R4", "3GPP-R5" and so forth. 



Type : Literal 
Resolution : Locked 
Examples : "3GPP-R4", 
</rdf s : comment> 
</rdf :Description> 



"3GPP-R5" 



<rdf: Description ID= "Render ingScreenSize"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf schema #Property" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: The rendering size of the device's screen in unit of 
pixels. The horizontal size is given followed by the vertical 
size. Legal values are pairs of integer values equal or greater 
than zero. A value equal "OxO"means that there exist no display or 
just textual output is supported. 

Type: Dimension 
Resolution : Locked 
Examples : "160x120" 
</rdf s : comment> 
</rdf : Description> 



<rdf : Description ID="SmilBaseSet "> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf schema #Property" /> 

<rdf s : domain rdf : re s our ce="# St reaming" /> 

<rdf s : comment> 

Description: Indicates a base set of SMIL 2.0 modules that the 
client supports. Leagal values are the following pre-defined 
identifiers: "SMIL-3GPP-R4 " indicates all SMIL 2.0 

modules required for scene description support according to clause 
8 of Release 4 of TS 26.234. "SMIL-3GPP-R5 " indicates all SMIL 2.0 
modules required for scene description support according to clause 
8 of the specification. 



Type : Literal 
Resolution : Locked 
Examples : "SMIL-3GPP-R4 " , 
</rdf s : comment> 
</rdf :Description> 



"SMIL-3GPP-R5" 



< rdf description ID="SmilModules"> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf schema #Property" /> 

<rdf : type rdf : resource="http : //www. w3 . org/2000 /01 /rdf- schema #Bag" /> 

<rdf s : domain rdf : resource="# St reaming" /> 

<rdf s : comment> 

Description: This attribute defines a list of SMIL 2.0 modules 
supported by the client. If the SmilBaseSet is used those modules 
do not need to be explicitly listed here. In that case only 
additional module support needs to be listed. Legal values are all 
SMIL 2.0 module names defined in the SMIL 2.0 recommendation [31], 
section 2.3.3, table 2. 

Type : Literal (bag) 
Resolution : Locked 

Examples : "BasicTransitions, MulitArcTiming" 
</rdf s : comment> 
</rdf : Description> 

</rdf :RDF> 
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Annex G (normative): 
Buffering of video 

G.1 Introduction 

This annex describes video buffering requirements in the PSS. As defined in clause 7.4 of the present document, 
support for the annex is optional and may be signalled in the PSS capability exchange and in the SDP. This is described 
in clause 5.2 and clause 5.3.3 of the present document. When the annex is in use, the content of the annex is normative. 
In other words, PSS clients shall be capable of receiving an RTP packet stream that complies with the specified 
buffering model and PSS servers shall verify that the transmitted RTP packet stream complies with the specified 
buffering model. 



G.2 PSS Buffering Parameters 



The behaviour of the PSS buffering model is controlled with the following parameters: the initial pre-decoder buffering 
period, the initial post-decoder buffering period, the size of the hypothetical pre-decoder buffer, the peak decoding byte 
rate, and the decoding macroblock rate. The default values of the parameters are defined below. 

The default initial pre-decoder buffering period is 1 second. 

The default initial post-decoder buffering period is zero. 

The default size of the hypothetical pre-decoder buffer is defined according to the maximum video bit-rate 
according to the table below: 

Table G.1 : Default size of the hypothetical pre-decoder buffer 



Maximum video bit-rate 


Default size of the hypothetical pre-decoder buffer 


65536 bits per second 


20480 bytes 


1 31 072 bits per second 


40960 bytes 


Undefined 


51200 bytes 



The maximum video bit-rate can be signalled in the media-level bandwidth attribute of SDP as defined in clause 
5.3.3 of this document. If the video-level bandwidth attribute was not present in the presentation description, the 
maximum video bit-rate is defined according to the video coding profile and level in use. 

The size of the hypothetical post-decoder buffer is an implementation-specific issue. The buffer size can be 
estimated from the maximum output data rate of the decoders in use and from the initial post-decoder buffering 
period. 

By default, the peak decoding byte rate is defined according to the video coding profile and level in use. For 
example, H.263 Level 10 requires support for bit-rates up to 64000 bits per second. Thus, the peak decoding byte 
rate equals to 8000 bytes per second. 

The default decoding macroblock rate is defined according to the video coding profile and level in use. If 
MPEG-4 Visual is in use, the default macroblock rate equals to VCV decoder rate. If H.263 is in use, the default 
macroblock rate equals to (1 / minimum picture interval) multiplied by number of macroblocks in maximum 
picture format. For example, H.263 Level 10 requires support for picture formats up to QCIF and minimum 
picture interval down to 2002 / 30000 sec. Thus, the default macroblock rate would be 30000 x 99 / 2002 » 1484 
macroblocks per second. 

PSS clients may signal their capability of providing larger buffers and faster peak decoding byte rates in the capability 
exchange process described in clause 5.2 of the present document. The average coded video bit-rate should be smaller 
than or equal to the bit-rate indicated by the video coding profile and level in use, even if a faster peak decoding byte 
rate were signalled. 
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Initial parameter values for each stream can be signalled within the SDP description of the stream. Signalled parameter 
values override the corresponding default parameter values. The values signalled within the SDP description guarantee 
pauseless playback from the beginning of the stream until the end of the stream (assuming a constant-delay reliable 
transmission channel). 

PSS servers may update parameter values in the response for an RTSP PLAY request. If an updated parameter value is 
present, it shall replace the value signalled in the SDP description or the default parameter value in the operation of the 
PSS buffering model. An updated parameter value is valid only in the indicated playback range, and it has no effect 
after that. Assuming a constant-delay reliable transmission channel, the updated parameter values guarantee pauseless 
playback of the actual range indicated in the response for the PLAY request. The indicated pre-decoder buffer size and 
initial post-decoder buffering period shall be smaller than or equal to the corresponding values in the SDP description or 
the corresponding default values, whichever ones are valid. The following header fields are defined for RTSP: 

x-predecbufsize:<size of the hypothetical pre-decoder buffer> 

This gives the suggested size of the Annex G hypothetical pre-decoder buffer in bytes. 

x-initpredecbufperiod:<initial pre-decoder buffering period> 

This gives the required initial pre-decoder buffering period specified according to Annex G. Values are 
interpreted as clock ticks of a 90-kHz clock. That is, the value is incremented by one for each 1/90 000 seconds. 
For example, value 180 000 corresponds to a two second initial pre-decoder buffering. 

x-initpostdecbufperiod:<initial post-decoder buffering period> 

This gives the required initial post-decoder buffering period specified according to Annex G. Values are 

interpreted as clock ticks of a 90-kHz clock. 

These header fields are defined for the response of an RTSP PLAY request only. Their use is optional. 

The following example plays the whole presentation starting at SMPTE time code 0:10:20 until the end of the clip. The 
playback is to start at 15:36 on 23 Jan 1997. The suggested initial post-decoder buffering period is half a second. 

C->S: PLAY rtsp: //audio. example. com/twister .en RTSP/1.0 
CSeq: 833 
Session: 12345678 

Range: smpte=0: 10 : 20-; time=19970123T153 600Z 
User-Agent : TheStreamClient /l . Ib2 

S->C: RTSP/1.0 200 OK 
CSeq: 833 

Date: 23 Jan 1997 15:35:06 GMT 
Range: smpte=0 : 10 : 22-; time=19970123T153600Z 
x-initpredecbufperiod: 45000 



G.3 PSS server buffering verifier 



The PSS server buffering verifier is specified according to the PSS buffering model. The model is based on two buffers 
and two timers. The buffers are called the hypothetical pre-decoder buffer and the hypothetical post-decoder buffer. The 
timers are named the decoding timer and the playback timer. 

The PSS buffering model is presented below. 

1 . The buffers are initially empty. 

2. A PSS Server adds each transmitted RTP packet having video payload to the pre-decoder buffer immediately 
when it is transmitted. All protocol headers at RTP or any lower layer are removed. 

3. Data is not removed from the pre-decoder buffer during a period called the initial pre-decoder buffering period. 
The period starts when the first RTP packet is added to the buffer. 

4. When the initial pre-decoder buffering period has expired, the decoding timer is started from a position indicated 
in the previous RTSP PLAY request. 

5. Removal of a video frame is started when both of the following two conditions are met: First, the decoding timer 
has reached the scheduled playback time of the frame. Second, the previous video frame has been totally 
removed from the pre-decoder buffer. 
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6. The duration of frame removal is the larger one of the two candidates: The first candidate is equal to the number 
of macroblocks in the frame divided by the decoding macroblock rate. The second candidate is equal to the 
number of bytes in the frame divided by the peak decoding byte rate. When the coded video frame has been 
removed from the pre-decoder buffer entirely, the corresponding uncompressed video frame is located into the 
post-decoder buffer. 

7. Data is not removed from the post-decoder buffer during a period called the initial post-decoder buffering period. 
The period starts when the first frame has been placed into the post-decoder buffer. 

8. When the initial post-decoder buffering period has expired, the playback timer is started from the position 
indicated in the previous RTSP PLAY request. 

9. A frame is removed from the post-decoder buffer immediately when the playback timer reaches the scheduled 
playback time of the frame. 

10. Each RTSP PLAY request resets the PSS buffering model to its initial state. 

A PSS server shall verify that a transmitted RTP packet stream complies with the following requirements: 

The PSS buffering model shall be used with the default or signalled buffering parameter values. Signalled 
parameter values override the corresponding default parameter values. 

The occupancy of the hypothetical pre-decoder buffer shall not exceed the default or signalled buffer size. 

Each frame shall be inserted into the hypothetical post-decoder buffer before or on its scheduled playback time. 



G.4 PSS client buffering requirements 

When the annex is in use, the PSS client shall be capable of receiving an RTP packet stream that complies with the PSS 
server buffering verifier, when the RTP packet stream is carried over a constant-delay reliable transmission channel. 
Furthermore, the video decoder of the PSS client, which may include handling of post-decoder buffering, shall output 
frames at the correct rate defined by the RTP time-stamps of the received packet stream. 
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Annex H (informative): 

Content creator guidelines for the synthetic audio medium 

type 

It is recommended that the first element of the MIP (Maximum Instantaneous Polyphony) message of the SP-MIDI 
content intended for synthetic audio PSS/MMS should be no more than 5. For instance the following MIP figures {4, 9, 
10, 12, 12, 16, 17, 20, 26, 26, 26} complies with the recommendation whereas {6, 9, 10, 12, 12, 16, 17, 20, 26, 26, 26} 
does not. 
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Annex I (informative): 

SP MIDI Device 5-24 Note Profile for 3GPP, SP-MIDI 

implementation guideline using a non-compliant hardware 

1.1 Introduction 

This informative annex describes some implementation guidelines intended for SP-MIDI device 5-24 Note Profile for 
3GPP [45]. These guidelines are here to give the possibility for manufacturers to develop early SP-MIDI 
implementations using MIDI hardware available at the time of the approval of release 5. These guidelines are valid only 
for release 5 implementations of SP-MIDI and are expected to be removed . It should be noted that these guidelines may 
reduce the musical performance of the synthesiser depending on the content and should be used with extreme caution. 



1. 2 Guidelines 

1.2.1 Support of multiple rhythm channels 

Scalable Polyphony synthesisers conformant to this Profile shall support at least two MIDI Channels that can function 
as Rhythm Channels, to enable a fluent scalable polyphony implementation. 

If the two rhythm Channels are not natively supported by the MIDI hardware, the SP-MIDI player could redirect the 
events intended to the additional rhythm channels toward the default rhythm channel (MIDI channel 10). The rendering 
of the SP-MIDI content should not be affected until different Channel settings (e.g. Channel Volume, Bank Setting, 
Panning etc.) are applied to the different rhythm Channels. It is recommended that only Channel settings intended for 
the default rhythm channel be applied. 

1.2.2 Support of individual stereophonic panning 

When the support of individual stereophonic panning is not possible by the stereophonic MIDI synthesiser, central 
panning should be used as default instead. 
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Annex J (informative): 

Mapping of SDP parameters to UMTS QoS parameters 

This Annex gives recommendation for the mapping rules needed by the PSS applications to request the appropriate QoS 
from the UMTS network (see Table J.l). 

Table J.1 : Mapping of SDP parameters to UMTS QoS parameters for PSS 



QoS parameter 


Parameter value 


comment 


Delivery of erroneous SDUs 


"No" 




Delivery order 


"No" 




Traffic class 


"Streaming class" 




Maximum SDU size 


1 400 bytes 


According to RFC 2460 the SDU size must 
not exceed 1500 octets. A packet size of 
1400 guarantees efficient transportation. 


Guaranteed bit rate for 
downlink 


1 .025 * session bandwidth 


This session bandwidth is calculated from the 
SDP media level bandwidth values. 


Maximum bit rate for 
downlink 


Equal or higher to guaranteed 
bit rate in downlink 




Guaranteed bit rate for 
uplink 


0.025 * session bandwidth 




Maximum bit rate for uplink 


Equal or higher to guaranteed 
bit rate in uplink 




Residual BER 


1*10-5 


16 bit CRC should be enough 


SDU error ratio 


1*1 0-4 or better 




Traffic handling priority 


Subscribed traffic handling 
priority 


Ignored 


Transfer delay 


2 sec. 
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Annex K (informative); 
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